Re: libprocstat(3): retrieve process command line args and environment
The updated patch set: http://people.freebsd.org/~trociny/procstat_core.4.patch It includes changes discussed with Kostik. New NT_PROCSTAT_PSSTRINGS and NT_PROCSTAT_AUXV notes are added. libprocstat(3) is extended with functions to retrieve env, args and auxv (so the patch that started this thread a couple of months ago has been merged to this patch set too). procstat(1) is fully converted to work only via libprocstat(3), all its options are supported for core dumps, except kernel stacks, which are not useful in this case. It looks for me to be in a good shape now and I am planning to start committing things in a week or two, after some additional testing, if there is no objections or other suggestions. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Sun, Mar 31, 2013 at 04:40:47PM +0300, Konstantin Belousov wrote: > I inspected imgact_elf.c:parse_note(), imgact_elf.c:putnote() and > rtld.c:digest_notes(). Only putnote() uses 8-byte alignment. > Every other OS and our !coredump code assumes 4-byte alignment. Thanks! > Does changing the putnote() to align on the 4-byte boundary cause > real change in the core file notes layout ? Currently, we store only 4 types of notes in a core file: #define NT_PRSTATUS 1 /* Process status. */ #define NT_FPREGSET 2 /* Floating point registers. */ #define NT_PRPSINFO 3 /* Process state info. */ #define NT_THRMISC 7 /* Thread miscellaneous info. */ I checked the sizes of structures inserted into the notes, and on amd64 they all are multiple of 8: (kgdb) p sizeof(prpsinfo_t) % 8 $1 = 0 (kgdb) p sizeof(prstatus_t) % 8 $2 = 0 (kgdb) p sizeof(prfpregset_t) % 8 $3 = 0 (kgdb) p sizeof(thrmisc_t) % 8 $4 = 0 so both 4-byte and 8-byte aligned. I believe that the patch below will not change the current core file notes layout, will make things consistent in our tree, and will make adding my procstat notes easier, if I use 4-byte alignment. Are you ok if I commit it before introducing my changes? Index: sys/kern/imgact_elf.c === --- sys/kern/imgact_elf.c (revision 248706) +++ sys/kern/imgact_elf.c (working copy) @@ -1538,10 +1538,10 @@ __elfN(putnote)(void *dst, size_t *off, const char *off += sizeof note; if (dst != NULL) bcopy(name, (char *)dst + *off, note.n_namesz); - *off += roundup2(note.n_namesz, sizeof(Elf_Size)); + *off += roundup2(note.n_namesz, sizeof(Elf32_Size)); if (dst != NULL) bcopy(desc, (char *)dst + *off, note.n_descsz); - *off += roundup2(note.n_descsz, sizeof(Elf_Size)); + *off += roundup2(note.n_descsz, sizeof(Elf32_Size)); } static boolean_t Also, shouldn't we update then the following comment in sys/elf_common.h? /* * Note header. The ".note" section contains an array of notes. Each * begins with this header, aligned to a word boundary. Immediately * following the note header is n_namesz bytes of name, padded to the * next word boundary. Then comes n_descsz bytes of descriptor, again * padded to a word boundary. The values of n_namesz and n_descsz do * not include the padding. */ -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Fri, Mar 29, 2013 at 11:22:45AM +0200, Konstantin Belousov wrote: > On Thu, Mar 28, 2013 at 11:18:21PM +0200, Mikolaj Golub wrote: > > On Thu, Mar 28, 2013 at 12:51:34PM +0200, Konstantin Belousov wrote: > > > > > In the generic Elf 64bit draft specification I have, the notes sections > > > are specified to consists of entries, each of which is an array of 8-byte > > > words. I think we are right using the 8-byte alignment. > > > > I have impression many implementations use 4-byte alignment. E.g. in > > NetBSD: > > > > sys/kern/core_elf32.c: > > > > #define ELFROUNDSIZE4 /* XXX Should it be sizeof(Elf_Word)? */ > > #define elfround(x) roundup((x), ELFROUNDSIZE) > Note that this is core_elf32. I am concerned with the 64-bit cores. core_elf64.c: #define ELFSIZE 64 #include "core_elf32.c" > > > > Also, we have inconsistency with imgactl_elf.c/parse_notes(), which > > uses 4-byte alignment: > > > > note = (const Elf_Note *)((const char *)(note + 1) + > > roundup2(note->n_namesz, sizeof(Elf32_Addr)) + > > roundup2(note->n_descsz, sizeof(Elf32_Addr))); > > > > I suppose there were no issues before, because accidentally the sizes > > of all notes we had were 8 bytes aligned. > Indeed, both ABI and NOINIT notes have size which is multiple of 8. > > > > > Now, when I add new notes it will break things. I don't have strong > > opinion, it will be ok for me to leave 8-byte alignment and fix > > issues, just want to have strong support here :-) > Well, while the issue is discussed and decided, you could just make > your new notes size be multiple of 8 too. I thought about this too. Then I need to be more caerful when extracting stats from notes, because the length returned by procstat_core_get() can be more than a real payload. Ok, I will try this way. I could add length to the note header, which is currently contains only structsize, so it would became something like: struct { int structsize; int lenght; } But not sure it is worth doing, especially if the forced 8-bit alignment is a temporary mesure. > > > > BTW, looking at NetBSD code I see they set p_align in the note > > segement to ELFROUNDSIZE: > > > > /* Write out the PT_NOTE header. */ > > ws.psections->p_type = PT_NOTE; > > ws.psections->p_offset = notestart; > > ws.psections->p_vaddr = 0; > > ws.psections->p_paddr = 0; > > ws.psections->p_filesz = notesize; > > ws.psections->p_memsz = 0; > > ws.psections->p_flags = PF_R; > > ws.psections->p_align = ELFROUNDSIZE; > > > > while we set to 0: > > > > /* The note segement. */ > > phdr->p_type = PT_NOTE; > > phdr->p_offset = hdrsize; > > phdr->p_vaddr = 0; > > phdr->p_paddr = 0; > > phdr->p_filesz = notesz; > > phdr->p_memsz = 0; > > phdr->p_flags = 0; > > phdr->p_align = 0; > You mean, for the core dumps ? yes > > > > Shouldn't we set it to alignment size too? Note also, they set > > "Segment is readable" flag. > I think both changes are fine. > > I skimmed over the usermode parts of the patch. One thing I did not liked > is the mis-handling of the read() return values. If there is short read, > the errno value is meaningless, but warn() would still append it to > the message. I suggest to explicitely distinguish -1 and >= 0 returns > from reads. ok. Thanks. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Thu, Mar 28, 2013 at 12:51:34PM +0200, Konstantin Belousov wrote: > In the generic Elf 64bit draft specification I have, the notes sections > are specified to consists of entries, each of which is an array of 8-byte > words. I think we are right using the 8-byte alignment. I have impression many implementations use 4-byte alignment. E.g. in NetBSD: sys/kern/core_elf32.c: #define ELFROUNDSIZE4 /* XXX Should it be sizeof(Elf_Word)? */ #define elfround(x) roundup((x), ELFROUNDSIZE) Also, we have inconsistency with imgactl_elf.c/parse_notes(), which uses 4-byte alignment: note = (const Elf_Note *)((const char *)(note + 1) + roundup2(note->n_namesz, sizeof(Elf32_Addr)) + roundup2(note->n_descsz, sizeof(Elf32_Addr))); I suppose there were no issues before, because accidentally the sizes of all notes we had were 8 bytes aligned. Now, when I add new notes it will break things. I don't have strong opinion, it will be ok for me to leave 8-byte alignment and fix issues, just want to have strong support here :-) BTW, looking at NetBSD code I see they set p_align in the note segement to ELFROUNDSIZE: /* Write out the PT_NOTE header. */ ws.psections->p_type = PT_NOTE; ws.psections->p_offset = notestart; ws.psections->p_vaddr = 0; ws.psections->p_paddr = 0; ws.psections->p_filesz = notesize; ws.psections->p_memsz = 0; ws.psections->p_flags = PF_R; ws.psections->p_align = ELFROUNDSIZE; while we set to 0: /* The note segement. */ phdr->p_type = PT_NOTE; phdr->p_offset = hdrsize; phdr->p_vaddr = 0; phdr->p_paddr = 0; phdr->p_filesz = notesz; phdr->p_memsz = 0; phdr->p_flags = 0; phdr->p_align = 0; Shouldn't we set it to alignment size too? Note also, they set "Segment is readable" flag. > > > > 4) In libprocstat I added new functions and placed them under already > > existent FBSD_1.3 version section in Symbol.map. > > > > Shouldn't I bump the version? Won't I need any additional care if I > > want to MFC the code to stable/9 and may be 8? > Version of what ? MFC does not require any additional actions, FBSD_1.3 > is the valid version namespace in 9 and 8. Ok. Now I see it was rather silly question :-). Thanks. For this and your other notes. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
Here is an updated patch set, which I think includes all kib's suggestions. It also introduces procstat groups, umask, rlimit, and osrel notes. http://people.freebsd.org/~trociny/procstat_core.3.patch Sbuf section interface looks like below: /* start a section */ sbuf_start_section(sb, NULL); /* write something to sbuf */ ... /* start a subsection (need to save the current section length) */ sbuf_start_section(sb, &old_len); /* write something to sbuf */ ... /* end a subsection padding to 4 bytes with '\0' bytes (need to provide the previously saved section length) */ sbuf_end_section(sb, old_len, 4, 0); ... /* aling the whole section to page size */ sbuf_end_section(sb, -1, PAGE_SIZE, 0); Open issues/questions: 1) I would also like to make libprocstat(3) extract environment, args, and auxv from the core. It looks like I don't need to store envv and argv in notes, as it is already present in the core. But I think I need to know psstrings to find them. Would it be ok, if I add auxv and psstrings notes, and extract envv and agrv from a program section in the core? 2) I started NT_PROCSTAT constants from the first not used number in elf_common.h, i.e. 8. But in this case they conflict with those available on other systems: contrib/binutils/include/elf/common.h: #define NT_PSTATUS 10 /* Has a struct pstatus */ #define NT_FPREGS 12 /* Has a struct fpregset */ #define NT_PSINFO 13 /* Has a struct psinfo */ #define NT_LWPSTATUS16 /* Has a struct lwpstatus_t */ #define NT_LWPSINFO 17 /* Has a struct lwpsinfo_t */ #define NT_WIN32PSTATUS 18 /* Has a struct win32_pstatus */ Although note name = "FreeBSD" should have disambiguated this, readelf looks like ignores this and its output for my core on i386 looks confusing: Owner Data size Description FreeBSD 0x006c NT_PRPSINFO (prpsinfo structure) FreeBSD 0x0068 NT_PRSTATUS (prstatus structure) FreeBSD 0x00b0 NT_FPREGSET (floating point registers) FreeBSD 0x0018 NT_THRMISC (thrmisc structure) FreeBSD 0x0304 Unknown note type: (0x0008) FreeBSD 0x0a6c Unknown note type: (0x0009) FreeBSD 0x08d4 NT_PSTATUS (pstatus structure) FreeBSD 0x000c Unknown note type: (0x000b) FreeBSD 0x0006 NT_FPREGS (floating point registers) FreeBSD 0x00d4 NT_PSINFO (psinfo structure) FreeBSD 0x0008 Unknown note type: (0x000e) Should I use some other range for NT_PROCSTAT notes? 3) Following our current code I align notes to sizeof(Elf_Size), which is 4 on i386 and 8 on amd64. But I have an issue reading the notes by readelf on amd64, which alway uses 4 byte alignment: contrib/binutils/binutils/readelf.c: next = (Elf_External_Note *)(inote.descdata + align_power (inote.descsz, 2)); where #define align_power(addr, align)\ (((addr) + ((bfd_vma) 1 << (align)) - 1) & ((bfd_vma) -1 << (align))) Should I change alignment to 4 bytes? 4) In libprocstat I added new functions and placed them under already existent FBSD_1.3 version section in Symbol.map. Shouldn't I bump the version? Won't I need any additional care if I want to MFC the code to stable/9 and may be 8? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Sun, Mar 17, 2013 at 08:30:33AM +0200, Konstantin Belousov wrote: > On Sun, Mar 17, 2013 at 12:35:20AM +0200, Mikolaj Golub wrote: > > A KPI that would be natural for my case: > > > > /* start a section that is going to be aligned to sizeof(Elf_Size), > > using byte '0' for padding */ > > sbuf_padded_start(sb, sizeof(Elf_Size), 0); > > /* write something to sbuf */ > > sbuf_bcat(sb, data, len); > > /* align the sbuf section */ > > sbuf_pad(sb); > > > > This might look a little awkward and would add some overhead for the normal > > case though... > This looks fine, in fact. You might want to call it sbuf_start_section() > and sbuf_end_section() ? Ok, will try this way. Thanks. ... > All you need is to reset req->oldidx. This could be done outside the > sbuf interface, in the top level function implementing the sysctl ? I am afraid at this level I don't know a value to reset req->oldidx to. Reseting it to 0 I think is not a good solution? > What you propose in the follow-up message should work too, I do not > have any preference. Ok. Thanks. > > > Indents after the else clauses in kern_proc_out() are wrong. > > > > Do you mean indents after '#ifdef COMPAT_FREEBSD32' block? I did it > > that way so if COMPAT_FREEBSD32 sections were removed from the code > > the indentation would be correct. I saw this practice through the code > > and used it myself before. > The sections are not going to be removed. IMHO code should be formatted > as if the preprocessor directive lines are not present. Could you point > out an example of existing code consistent with your indentation ? In kern/kern_proc.c my code (get_proc_vector, sysctl_kern_proc_auxv) has such indentation. There were no objection when I introduced it, so I thought it was a right way. But surely I didn't invented such indentation myself. I don't recall if I used some particular examples as a reference then, but here are several examples of such indentation found by quick grep: net/bpf.c-#ifdef BPF_JITTER net/bpf.c- bf = bpf_jitter_enable != 0 ? d->bd_bfilter : NULL; net/bpf.c- if (bf != NULL) net/bpf.c- slen = (*(bf->func))(pkt, pktlen, pktlen); net/bpf.c- else net/bpf.c:#endif net/bpf.c- slen = bpf_filter(d->bd_rfilter, pkt, pktlen, pktlen); kern/kern_conf.c-#if 0 kern/kern_conf.c- if (dev->si_usecount == 0 && kern/kern_conf.c- (dev->si_flags & SI_CHEAPCLONE) && (dev->si_flags & SI_NAMED)) kern/kern_conf.c- ; kern/kern_conf.c- else kern/kern_conf.c:#endif kern/kern_conf.c- if (dev->si_devsw == NULL && dev->si_refcount == 0) { kern/kern_jail.c- if (SV_PROC_FLAG(td->td_proc, SV_ILP32)) { kern/kern_jail.c- uint32_t hid32 = pr->pr_hostid; kern/kern_jail.c- kern/kern_jail.c- error = vfs_setopt(opts, "host.hostid", &hid32, sizeof(hid32)); kern/kern_jail.c- } else kern/kern_jail.c:#endif kern/kern_jail.c- error = vfs_setopt(opts, "host.hostid", &pr->pr_hostid, netinet6/raw_ip6.c- /* Do not inject data into pcb. */ netinet6/raw_ip6.c- INP_RUNLOCK(last); netinet6/raw_ip6.c- } else netinet6/raw_ip6.c:#endif /* IPSEC */ netinet6/raw_ip6.c- if (last != NULL) { netinet6/raw_ip6.c- if (last->inp_flags & INP_CONTROLOPTS || On the other hand there are many examples where indentation is used in the way you prefer. I don't have any strong opinion about this so I will do in the way you suggest. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Sun, Mar 17, 2013 at 12:35:20AM +0200, Mikolaj Golub wrote: > Ah, this is a thing I wanted to discuss but forgot! As I understand > the idea of the 'ABI hack' is: if the output buffer is less than the > size of data we have, truncate our data to the last successfully > written kinfo_file structure and return without error. > > In my code I do reset ENOMEM to 0 (see sysctl_kern_proc_filedesc), but > I don't see a way how using sbuf interface I can truncate req->oldidx > to the last successfully written file: sbuf_bcat() (to internal > buffer) may be successful and the error might appear only later, when > draining. I suspect it will break things if I return with a partial > kinfo_file, but haven't come with a solution yet... A solution I am going to try is to provide maxlen argument to kern_proc_filedesc_out(), and if it is not 0, output files that do not exceed the limit, so sysctl_kern_proc_filedesc would call: kern_proc_filedesc_out(p, &sb, req->oldlen); -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Sat, Mar 16, 2013 at 09:16:05PM +0200, Konstantin Belousov wrote: > IMO sbuf_pad() should be moved to subr_sbuf.c. I find the KPI of > the sbuf_pad() wrong. You have two separate number, the amount to > pad to, and the current amount. Natural interface would take the > two numbers separate instead of the difference. Also, could the > sbuf_pad() deduce the current amount on its own, this would be the > best ? Hm, I am not sure about this. So are you proposing instead of something like this sbuf_pad(sb, roundup(x, y) - x, 0); to have sbuf_pad(sb, x, roundup(x, y), 0)? Although it is a little easier to write, it looks less intuitive for me. E.g. I have troubles how to document this and explain. I can't reffer x as a current position in sbuf, because it might not be. It is just a some position, roundup(x,y) is another one, and only their difference makes sence for sbuf_pad, so why not just to provide this difference? So sbuf_pad(sb, from, to, c); looks for me less intutive than sbuf_pad(sb, len, c); A KPI that would be natural for my case: /* start a section that is going to be aligned to sizeof(Elf_Size), using byte '0' for padding */ sbuf_padded_start(sb, sizeof(Elf_Size), 0); /* write something to sbuf */ sbuf_bcat(sb, data, len); /* align the sbuf section */ sbuf_pad(sb); This might look a little awkward and would add some overhead for the normal case though... > In register_note(), put spaces around '|' for the malloc line. > > It seems that you did not moved the 'ABI hack' for ENOMEM situation for > sysctl_kern_proc_filedesc() into the rewritten function. > Ah, this is a thing I wanted to discuss but forgot! As I understand the idea of the 'ABI hack' is: if the output buffer is less than the size of data we have, truncate our data to the last successfully written kinfo_file structure and return without error. In my code I do reset ENOMEM to 0 (see sysctl_kern_proc_filedesc), but I don't see a way how using sbuf interface I can truncate req->oldidx to the last successfully written file: sbuf_bcat() (to internal buffer) may be successful and the error might appear only later, when draining. I suspect it will break things if I return with a partial kinfo_file, but haven't come with a solution yet... > Please commit the changes to use pget() in the sysctl handlers separately. > > Indents after the else clauses in kern_proc_out() are wrong. Do you mean indents after '#ifdef COMPAT_FREEBSD32' block? I did it that way so if COMPAT_FREEBSD32 sections were removed from the code the indentation would be correct. I saw this practice through the code and used it myself before. > Since you are moving the KERN_PROC_ZOMBMASK out of kern_proc.c, > a comment is needed to describe its usage. I would find it very > confusing if grep reveals no like of code that sets the flags. > > In the comment for sbuf_drain_core_output(), s/drainig/draining/, > s/sefely/safely/ and s/hold/call us with the process lock held/. > > I do not see much sense in the kstack note. The core is dumped when > the threads are moved into the safe state in the kernel, so you > cannot catch 'living' stack backtraces for the kernel side of > things. I was afraid of it after I had tried it on real dumps :-( Ok, will remove the kstack note. > On the other hand, things like umask, resources and osrel might be > of the interest for post-morted analysis. This is in my TODO list. > I did not looked at the usermode changes. Thanks for your suggestions! Will do them. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Wed, Feb 20, 2013 at 09:58:02PM +0200, Mikolaj Golub wrote: > On Wed, Feb 20, 2013 at 09:04:14AM -0500, John Baldwin wrote: > > > The process should be stopped by the time we dump a core, so running it > > multiple times should be ok in that the sizes should not change. I would > > say that you should try to implement a "determine sizes" pass that doesn't > > allocate anything, but others should comment on that. > > I had a little talk with kib about this recently. Kib's main concern > looked to be that a process with many threads/open files might require > considerable amount of kernel memory if the procstat notes are > prepared in memory before writing. So currently I am working on > another approach, when on the first pass the sizes are found, and on > the second pass procstat notes are written to coredump without > preliminarily storing all notes in memory buffer. Hope, the code won't > look very ugly... Here is an updated patch: http://people.freebsd.org/~trociny/procstat_core.2.patch - The coredump routines are modified to be able to write notes directly to a core file, via sbuf interface, without preliminary preparing them all in a memory buffer. - To write NT_PROCSTAT_* notes, the corresponding sysctl routines are changed to provide kern_proc_filedesc_out(), kern_proc_kstack_out(), kern_proc_out(), and kern_proc_vmmap_out() functions. - libprocstat(3) is extended to extract procstat notes from a process core file. Also new functions (procstat_getvmmap, procstat_kstack) are added. - procstat(1) is changed to use libprocstat(3) where it is possible and to treat non-numeric command line arguments as core files. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Wed, Feb 20, 2013 at 09:04:14AM -0500, John Baldwin wrote: > The process should be stopped by the time we dump a core, so running it > multiple times should be ok in that the sizes should not change. I would > say that you should try to implement a "determine sizes" pass that doesn't > allocate anything, but others should comment on that. I had a little talk with kib about this recently. Kib's main concern looked to be that a process with many threads/open files might require considerable amount of kernel memory if the procstat notes are prepared in memory before writing. So currently I am working on another approach, when on the first pass the sizes are found, and on the second pass procstat notes are written to coredump without preliminarily storing all notes in memory buffer. Hope, the code won't look very ugly... > One other thing to consider is if gcore needs to be updated to output these > records as well. It could use the sysctls to fetch the data and then write > out appropriate notes I think, so perhaps it wouldn't be too difficult to add > this as a followup commit once the kernel version has settled and the file > format is set? Looks like very interesting idea! Thank you for all your comments and suggestions. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
the format first, but I would be glad if somebody helps me with the problem I faced linking libelf to libprocstat. With the modifications to the libprocstat makefile: -DPADD= ${LIBKVM} ${LIBUTIL} -LDADD= -lkvm -lutil +DPADD= ${LIBELF} ${LIBKVM} ${LIBUTIL} +LDADD= -lelf -lkvm -lutil buildworld fails with the error: make: don't know how to make /scratch2/tmp/trociny/obj/i386.i386/home/trociny/svn/base/head/tmp/usr/lib/libelf.a. Stop -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Wed, Jan 23, 2013 at 11:31:43AM -0500, John Baldwin wrote: > On Wednesday, January 23, 2013 2:25:00 am Mikolaj Golub wrote: > > IMHO, after adding procstat_getargv and procstat_getargv, the usage of > > kvm_getargv() and kvm_getenvv() (at least in the new code) may be > > deprecated. As this is stated in the man page, BUGS section, "these > > routines do not belong in the kvm interface". I suppose they are part > > of libkvm because there was no a better place for them. procstat(1) > > prefers direct sysctl to them (so, again, code duplication, which I am > > going to remove adding procstat_getargv/envv). > > Hmm, are you going to rewrite ps(1) to use libprocstat? Or rather, is that a > goal someday? That is one current consumer of kvm_getargv/envv. That might > be fine if we want to make more tools use libprocstat instead of using libkvm > directly. I didn't have any plans for ps(1) :-) That is why I wrote about "new code". But if you think it is good to do I might look at it one day... -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Tue, Jan 22, 2013 at 02:17:39PM -0800, Stanislav Sedov wrote: > > On Jan 22, 2013, at 1:48 PM, John Baldwin wrote: > > > > Well, you could make procstat open a kvm handle in both cases (open a > > "live" > > handle in the procstat_open_sysctl() case). It just seems rather silly to > > be > > duplicating code in the two interfaces. In this particular case I prefer code duplication to opening a kvm handle in procstat_open_sysctl(), as it looks a bit confusing. But I can do this way if the agreement is reached. > > More a question for Robert: does > > libprocstat intentionally duplicate the code in libkvm for other things as > > well in the live case? (Like fetching the list of processes?) > > > It does not actually has a duplicate code, the code for fetching the list of > processes via sysctl is different from the KVM case. The open file > descriptors > processing is different as well. Because libprocstat implements almost the > same functionality both for sysctl and mvm backends, it can be used to analyze > both the live system and the kernel crash dumps. The code Mikolaj proposed > only implements the sysctl backend currently, so it does not seem to have > any relation to KVM, so it will be a bit weird to make it open a KVM handle > though it does not use it. IMHO, after adding procstat_getargv and procstat_getargv, the usage of kvm_getargv() and kvm_getenvv() (at least in the new code) may be deprecated. As this is stated in the man page, BUGS section, "these routines do not belong in the kvm interface". I suppose they are part of libkvm because there was no a better place for them. procstat(1) prefers direct sysctl to them (so, again, code duplication, which I am going to remove adding procstat_getargv/envv). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: libprocstat(3): retrieve process command line args and environment
On Tue, Jan 22, 2013 at 12:01:06PM -0500, John Baldwin wrote: > How is this different from kvm_getargv()? It seems to be a direct copy. libprocstat(3) is a frontend for sysctl(3) and kvm(3) interfaces, so it is good to extend it to cover "getarg/env" functionality. Yes the functions look similar to kvm_getargv() but I couldn't implement them just as wrappers around kvm_getargv(): I would like to have libprocstat functions thread safe, while kvm_getargv() uses static variables for its internal buffers. It looks like I could fix kvm_getargv() to use fields of kvm structure instead of static variables to store pointers to the buffers, and then use it in libprocstat(3). Do you think it is worth doing? BTW, struct __kvm already contains some pointers, which looks like are unused currently: char**argv; /* (dynamic) storage for argv pointers */ int argc; /* length of above (not actual # present) */ char*argbuf;/* (dynamic) temporary storage */ But if I even had kvm_getargv() to behave as I wanted, there is still an issue with using it in libprocstat(): to get kvm structure you need to initialize procstat using procstat_open_kvm(). It is supposed to call procstat_open_kvm() when you want to read from kernel memory, while kvm_getargv() uses sysctl. So from a user point of you it would be a litle confusing if she had to call procstat_open_kvm() to get runtime args and env. If she wanted e.g. to get both runtime args and file info (via sysctl) she would have to do procstat_open_kvm() for args and procstat_open_sysctl() for files. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
libprocstat(3): retrieve process command line args and environment
Hi, Some time ago Stanislav Sedov suggested to me extending libprocstat(3) with functions to retrieve process command line arguments and environment variables. In the first approach I tried, the newly added functions procstat_getargv/getenvv allocated a buffer of necessary size, stored the values and returned to the caller: http://people.freebsd.org/~trociny/libprocstat.1.patch The problem with this approach was that when I updated procstat(1) to use this interface, I observed noticeable performance degradation (about 30% on systems with MALLOC_PRODUCTION off), due to memory allocation overhead: the original procstat(1) reuses the buffer for all its retrievals. So my second approach was to add internal buffers to struct procstat, which are used by procstat_getargv/getenvv to store values and reused on the subsequent call: http://people.freebsd.org/~trociny/libprocstat.2.patch The drawback of this approach is that a user has to take care and remember that a subsequent call rewrites argument vector obtained from the previous call. On the other hand this is ok for typical use cases while does not add allocation overhead, so I like this approach more. I would like to commit this second patch, if there are no objections or suggestions how to improve the things. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Changing `iostat -Ix' output
Hi, I don't like very much what `iostat -Ix' outputs and would like to change this. A typical output: % iostat -Ix extended device statistics device r/i w/ikr/ikw/i qlen svc_t %b ada0 5599136.0 3953193.0 39982760.0 86819866.50 15.1 16 cd0 552.0 0.0 4.3 0.00 0.0 0 pass0333.0 0.0 166.5 0.00 0.0 0 pass1 2.0 0.0 1.0 0.00 0.0 0 Parameters like r/i, kr/i (total io operations/kbytes) are very useful. They allow to use `iostat -Ix' to collect IO statistics running it periodically (from cron or some monitoring tool) and calculate average amount of operations or bytes per second at the specified period subtracting the current value from the previous one and dividing by time period. But you can't do the same with % busy, which is very useful IO characteristics. Average % busy at the specified period could be calculated storing total busy time for the device at time t1, total busy time at t2 and then subtracting the last value from the first (to get busy time at this period) and dividing by the time period. Currently iostat(8) does not provide 'total busy time' statistics. I use sysutils/devstat for this but it would be nice if iostat(8) itself provide such functionality. I propose for `iostat -Ix` to output total busy time instead of % busy, and also total duration of transactions instead of average duration (to be able to calculate average duration for the period between two iostat runs). http://people.freebsd.org/~trociny/iostat.total_busy_time.1.patch Average duration and % busy are still available via `iostat -x`. Here is an output example % ./iostat -Ix; sleep 60; ./iostat -Ix extended device statistics device r/i w/i kr/i kw/i qlen tdur sb ada0 5599785.0 3961913.0 39985960.5 86902385.50 144055.5 35966.5 cd0554.0 0.0 4.3 0.000.0 9.5 pass0 336.0 0.0168.0 0.000.0 17.5 pass12.0 0.0 1.0 0.000.0 0.0 extended device statistics device r/i w/i kr/i kw/i qlen tdur sb ada0 5599922.0 3963177.0 40002608.0 86958230.50 144074.4 35970.5 cd0554.0 0.0 4.3 0.000.0 9.5 pass0 336.0 0.0168.0 0.000.0 17.5 pass12.0 0.0 1.0 0.000.0 0.0 So, for ada0, % busy for that period was 100 * (35970.5 - 35966.5) / 60 = 6. And service time (assuming that only read and write operations were serviced) was 1000 * (144074.4 - 144055.5) / (5599922 - 5599785 + 3963177 - 3961913) = 13.4 msec. What do you think about this? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: SuperPages utilization survey
On Sat, 9 Jun 2012 12:07:40 +0300 Konstantin Belousov wrote: KB> Well, if I see a report informing me that some 2M region contains 512 super KB> pages, how should I interpret it ? For me, it is only one superpage (mapping) KB> that can be created in one 2M region. Well, if I see a report like below: PID STARTEND PRTRES PRES SUP REF SHD FL TP PATH 485680x800c00x820c0 rw- 1310720 51712 2 0 --S df it tells me that for the region 0x800c0-0x820c0 (512Mb) we have 131072 * 4k = 512Mb resident and 51712 * 4k = 202Mb (a litle less than a half of the region) promoted (mapped) to superpages. If I had number of superpages here I would need additional knowledge (a superpage size) to calculate how effectively superpages are used. But actually, no much difference for me. To get a number of superpages is it enough just to divide the result obtained counting normal-sized pages by (2M/4k) factor? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: SuperPages utilization survey
On Sat, 9 Jun 2012 11:38:22 +0300 Konstantin Belousov wrote: KB> On Sat, Jun 09, 2012 at 10:31:17AM +0300, Mikolaj Golub wrote: >> >> On Fri, 1 Jun 2012 14:54:48 +0200 Ivan Voras wrote: >> >> IV> On 1 June 2012 14:35, Wojciech Puchar >> wrote: >> >>> http://people.freebsd.org/~ivoras/stuff/spsurvey.py >> >> ... >> >> IV> If anyone posts more data, I'll analyse it. I'm more worried about the >> IV> granularity of procstat, where it marks the entire region if a single >> IV> superpage exists in it - it means any such analysis is only >> IV> approximate. >> >> Here is a patch (for kernel and procstat) that allows to see amount of pages >> mapped to superpages. >> >> http://people.freebsd.org/~trociny/procstat-superpages.cnt.1.patch >> >> Not sure it is useful enough to be committed. KB> Superpage aggregates mappings for several normal-sized pages. KB> As a consequence, when you iterate over small pages in KB> sysctl_kern_proc_vmmap(), you account each superpage as many time as KB> much constituent small pages it contains. This is exactly what my intention was to count: how much memory are handled by superpages (using normal-sized page as a measurement unit), not amount of superpages. And I think this is what Ivan wanted to know. Do you think it is better to return number of superpages? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: SuperPages utilization survey
On Fri, 1 Jun 2012 14:54:48 +0200 Ivan Voras wrote: IV> On 1 June 2012 14:35, Wojciech Puchar wrote: >>> http://people.freebsd.org/~ivoras/stuff/spsurvey.py ... IV> If anyone posts more data, I'll analyse it. I'm more worried about the IV> granularity of procstat, where it marks the entire region if a single IV> superpage exists in it - it means any such analysis is only IV> approximate. Here is a patch (for kernel and procstat) that allows to see amount of pages mapped to superpages. http://people.freebsd.org/~trociny/procstat-superpages.cnt.1.patch Not sure it is useful enough to be committed. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: a sysctl for process binary osreldate
On Thu, 22 Mar 2012 22:38:15 +0200 Mikolaj Golub wrote: MG> Actually I don't see reasons why this may not be p_cansee, so I MG> updated the patch and going to commit it if there is no objections. The updated patch: http://people.freebsd.org/~trociny/kern_proc_osrel.2.patch -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: a sysctl for process binary osreldate
On Sat, Mar 17, 2012 at 9:30 PM, Mikolaj Golub wrote: > Hi, > > Currently we can check and change binary osreldate of another process via > procfs(5). > > Kostik suggested to add a new sysctl for the same purpose and also extend > procstat to show osrel. > > Here are patches I am going to commit if there are no objections or > suggestions. > > http://people.freebsd.org/~trociny/kern_proc_osrel.1.patch > http://people.freebsd.org/~trociny/procstat.osrel.1.patch > > I set the same permissions as for procfs(5) osrel -- so only user can read it, > but may be this is too restrictive and p_cansee on read would be ok? Actually I don't see reasons why this may not be p_cansee, so I updated the patch and going to commit it if there is no objections. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: a sysctl for process binary osreldate
On Sat, 17 Mar 2012 22:29:01 +0100 Jilles Tjoelker wrote: JT> On Sat, Mar 17, 2012 at 09:30:05PM +0200, Mikolaj Golub wrote: >> I added osrel output to procstat -b option: >> kopusha:~% procstat -b 2975 >> PID COMMOSREL PATH >> 2975 emacs 101 /usr/local/bin/emacs-23.3 >> Would this be ok or someone see a better way? JT> Hmm, this means that procstat is not supposed to be used from scripts as JT> it is apparently OK to change its output format like this? Yes, breaking output compatibility worries me too. Although I already broke it recently for '-s' option, adding umask output. Let me cite Robert (taken from our then discussion about procstat umask output): > if we add too many arguments we'll start looking like ps(1), whereas the > point of procstat(1) is that it's *not* ps(1) :-). That is why I decided to not introduce yet another option here too at the cost of breaking compatibility. But I am open for any suggestions. JT> In some ways, querying via ps would be better for scripts since it JT> allows things like JT> ps -p PID -o KEYWORD= JT> which do not need additional parsing except that many of the newer JT> things in procstat do not have ps keywords. JT> -- JT> Jilles Tjoelker -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: a sysctl for process binary osreldate
On Sat, 17 Mar 2012 23:26:53 +0200 Konstantin Belousov wrote: KB> On Sat, Mar 17, 2012 at 11:07:24PM +0200, Mikolaj Golub wrote: >> >> On Sat, 17 Mar 2012 16:37:02 -0400 Jason Hellenthal wrote: >> >> JH> Would this be a planned MFC to stable/N as well specifcially 8 ? >> >> I plan to MFC to stable/9 if there is no objections. KB> I do not see why the merge to stable/8 cannot be done from the technical KB> POV. If Mikolaj has no time or desire to merge to 8, I can help him. If people consider this to be useful, no problem for me to merge to 8. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: a sysctl for process binary osreldate
On Sat, 17 Mar 2012 16:37:02 -0400 Jason Hellenthal wrote: JH> Would this be a planned MFC to stable/N as well specifcially 8 ? I plan to MFC to stable/9 if there is no objections. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
a sysctl for process binary osreldate
Hi, Currently we can check and change binary osreldate of another process via procfs(5). Kostik suggested to add a new sysctl for the same purpose and also extend procstat to show osrel. Here are patches I am going to commit if there are no objections or suggestions. http://people.freebsd.org/~trociny/kern_proc_osrel.1.patch http://people.freebsd.org/~trociny/procstat.osrel.1.patch I set the same permissions as for procfs(5) osrel -- so only user can read it, but may be this is too restrictive and p_cansee on read would be ok? I added osrel output to procstat -b option: kopusha:~% procstat -b 2975 PID COMMOSREL PATH 2975 emacs 101 /usr/local/bin/emacs-23.3 Would this be ok or someone see a better way? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sun, 4 Dec 2011 15:57:06 + Robert N. M. Watson wrote: RNMW> On 4 Dec 2011, at 14:31, Jilles Tjoelker wrote: >> On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote: >>> [KERN_PROC_AUXV requires just p_cansee()] >> >> If we are ever going to do ASLR, the AUXV information tells an attacker >> where the stack, executable and RTLD are located, which defeats much of >> the point of randomizing the addresses in the first place. >> >> Given that the AUXV information seems to be used by debuggers only >> anyway, I think it would be good to move it to p_candebug() now. >> >> The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already >> under p_candebug(). RNMW> Agreed. In general, my view is that p_cansee() should be used for very RNMW> few of our process inspection APIs. I like your example of ASLR RNMW> especially, as it illustrates how debugging information can aid even RNMW> local attacks (i.e., user vs. setuid binary). What do you think about recently added kern.proc.ps_strings, which returns location of ps_strings structure? It uses p_cansee() too. The location is the same for all processes of the same ABI, so this does not look like sensitive information, on the other hand it also seems to be used by debuggers only. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sun, 4 Dec 2011 15:57:06 + Robert N. M. Watson wrote: RNMW> On 4 Dec 2011, at 14:31, Jilles Tjoelker wrote: >> On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote: >>> [KERN_PROC_AUXV requires just p_cansee()] >> >> If we are ever going to do ASLR, the AUXV information tells an attacker >> where the stack, executable and RTLD are located, which defeats much of >> the point of randomizing the addresses in the first place. >> >> Given that the AUXV information seems to be used by debuggers only >> anyway, I think it would be good to move it to p_candebug() now. >> >> The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already >> under p_candebug(). RNMW> Agreed. In general, my view is that p_cansee() should be used for very RNMW> few of our process inspection APIs. I like your example of ASLR RNMW> especially, as it illustrates how debugging information can aid even RNMW> local attacks (i.e., user vs. setuid binary). Thanks! I will change it to p_candebug(). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Wed, 9 Nov 2011 14:53:29 +0200 Kostik Belousov wrote: >> > http://people.freebsd.org/~trociny/env.sys.4.patch >> > >> > Investigating cases when EFAULT was returned and if the fallback was >> > successful I noticed that most of the cases were when p->p_comm changed >> > during >> > the read, so the process was in exec in that time. In order to avoid this >> > error I added a check for P_INEXEC flag. >> And now you return success and nothing gets copied out for the process >> in P_INEXEC state. Either you should return an error like EAGAIN, or >> consider the P_INEXEC state as transitional and wait till process >> leaves it. Or, ignore the state as it was before, and return whatever >> error proc_rwmem generated (my preference). KB> Forgot to say that the check does not change much because you drop KB> process lock immediately after the check, so the process may enter KB> the INEXEC state right after the check. I believe you already tried KB> to do this with P_WEXIT. Ok, eventually I decided not to check for P_INEXEC (as the simplest :-). The updated patch: http://people.freebsd.org/~trociny/env.sys.5.patch -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Wed, 09 Nov 2011 15:31:26 +0200 Mikolaj Golub wrote: MG> On Wed, 9 Nov 2011 14:53:29 +0200 Kostik Belousov wrote: >>> And now you return success and nothing gets copied out for the process >>> in P_INEXEC state. Either you should return an error like EAGAIN, or >>> consider the P_INEXEC state as transitional and wait till process >>> leaves it. Or, ignore the state as it was before, and return whatever >>> error proc_rwmem generated (my preference). KB>> Forgot to say that the check does not change much because you drop KB>> process lock immediately after the check, so the process may enter KB>> the INEXEC state right after the check. I believe you already tried KB>> to do this with P_WEXIT. MG> Good point :-). Although after adding the P_INEXEC I have not seen errors any MG> more, while before they were often (when running 'procstat -ca' in loop and MG> building world simultaneously). Thus it looks like the probability is much MG> smaller. MG> So, it still looks good for me to check for P_INEXEC and return EAGAIN, and MG> add the comment why we do this and that it still racy. But if you still think MG> that ignoring the state is the best option no problems for me to return it MG> back. Realted to this, sysctl_kern_proc_kstack() looks like has the similar issue. But it returns ESRCH instead. /* XXXRW: Not clear ESRCH is the right error during proc execve(). */ if (p->p_flag & P_WEXIT || p->p_flag & P_INEXEC) { PROC_UNLOCK(p); return (ESRCH); } ... _PHOLD(p); PROC_UNLOCK(p); -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Wed, 9 Nov 2011 14:53:29 +0200 Kostik Belousov wrote: >> And now you return success and nothing gets copied out for the process >> in P_INEXEC state. Either you should return an error like EAGAIN, or >> consider the P_INEXEC state as transitional and wait till process >> leaves it. Or, ignore the state as it was before, and return whatever >> error proc_rwmem generated (my preference). KB> Forgot to say that the check does not change much because you drop KB> process lock immediately after the check, so the process may enter KB> the INEXEC state right after the check. I believe you already tried KB> to do this with P_WEXIT. Good point :-). Although after adding the P_INEXEC I have not seen errors any more, while before they were often (when running 'procstat -ca' in loop and building world simultaneously). Thus it looks like the probability is much smaller. So, it still looks good for me to check for P_INEXEC and return EAGAIN, and add the comment why we do this and that it still racy. But if you still think that ignoring the state is the best option no problems for me to return it back. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Wed, 9 Nov 2011 14:44:55 +0200 Kostik Belousov wrote: KB> On Tue, Nov 08, 2011 at 11:47:54PM +0200, Mikolaj Golub wrote: >> >> http://people.freebsd.org/~trociny/env.sys.4.patch >> >> Investigating cases when EFAULT was returned and if the fallback was >> successful I noticed that most of the cases were when p->p_comm changed >> during >> the read, so the process was in exec in that time. In order to avoid this >> error I added a check for P_INEXEC flag. KB> And now you return success and nothing gets copied out for the process KB> in P_INEXEC state. This looked ok for me: new arguments have not been created for the process yet, so return nothing :-) KB> Either you should return an error like EAGAIN, or consider the P_INEXEC KB> state as transitional and wait till process leaves it. Or, ignore the KB> state as it was before, and return whatever error proc_rwmem generated KB> (my preference). I prefer EAGAIN :-). Reading in the process space that is being currrently updated does not look good for me. And EAGAIN gives a hint that if I try it again I will probably get the result, while EFAULT looks mysterious. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sun, 6 Nov 2011 20:10:41 +0200 Kostik Belousov wrote: KB> On Sat, Nov 05, 2011 at 10:37:46PM +0200, Mikolaj Golub wrote: >> >> http://people.freebsd.org/~trociny/env.sys.3.patch KB> Oops, I missed this in the previous review. You cannot use fubyte in KB> proc_read_mem(). fubyte reads a byte from the address space of the current KB> process. The fix is easy, use proc_rwmem for 1 byte. KB> I do not think that fall back to single byte read is warranted for KB> proc_read_mem calls e.g. for ps_strings. Add a flag to indicate whether KB> the proc_read_mem should fall back to byte read ? KB> I would prefer using sizeof(uint64_t) and sizeof(uint32_t) instead of 8 KB> and 4 constants in the align checks. KB> Might be, add PROC_ASSERT_HELD() to get_ps_string() ? KB> procfs patch looks good. Thanks. The updated version: http://people.freebsd.org/~trociny/env.sys.4.patch Investigating cases when EFAULT was returned and if the fallback was successful I noticed that most of the cases were when p->p_comm changed during the read, so the process was in exec in that time. In order to avoid this error I added a check for P_INEXEC flag. After this I observed EFAULT (very rarely) only when reading arg or env strings and fallback was successful for those cases. So I modified the patch to do fallback only when reading strings (as it was in one of my earlier versions but with wrong fubyte), and returned your comment which explains why it may happen :-) Also in the procfs patch I have added the check for process state. The userland part has not been changed since my first report: http://people.freebsd.org/~trociny/env.user.patch -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sat, 5 Nov 2011 15:58:01 +0200 Kostik Belousov wrote: KB> procfs_doproccmdline() can benefit from your work. Patch for procfs: http://people.freebsd.org/~trociny/procfs.patch -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sat, 5 Nov 2011 21:45:53 +0200 Kostik Belousov wrote: KB> On Sat, Nov 05, 2011 at 08:59:21PM +0200, Mikolaj Golub wrote: >> >> On Sat, 5 Nov 2011 17:44:43 +0200 Kostik Belousov wrote: >> >> >> KB> I think that the aux vector must be naturally aligned. You can >> return >> >> KB> ENOEXEC early if vptr is not aligned. >> >> >> >> Not sure I see what you mean. vptr for auxv is calculated just couple >> lines >> >> above, and I check the result here, in the part common for all vector >> types. >> KB> You do not check for the alignment. Am I wrong ? >> >> I see now. If natural alignment means "addr % sizeof(aux) == 0" then the aux >> vectors are not naturally aligned. After adding this check: >> >> if (vptr % sizeof(aux) != 0) >> return (ENOEXEC); KB> No, the natural alignment of the structure is the alignment of the most KB> demanding member. So it is 4 bytes on 32bit, and 8 bytes on 64. >> >> I started to observe many ENOEXEC errors. Adding printf showed that the >> vectors are half size aligned. >> >> On i386: >> >> get_proc_vector(pid = getty[3442], type = 2): vptr (2143284876) % >> sizeof(aux) (8) = 4) >> >> On amd64: >> >> get_proc_vector(pid = getty[2425], type = 2): vptr (140737488346568) % >> sizeof(aux) (16) = 8) >> >> Looking at exec_copyout_strings() from kern_exec.c, how destp is >> calculated, I >> think they are sizeof(char *) aligned. >> >> Do you think it is worth adding the check for sizeof(char *) alignment? >> >> if (vptr % (sizeof(char *) != 0) >> return (ENOEXEC); KB> I suggest to use #if __ELF_WORD_SIZE == 32 or 64. Thanks. The updated patch: http://people.freebsd.org/~trociny/env.sys.3.patch -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sat, 5 Nov 2011 17:44:43 +0200 Kostik Belousov wrote: >> KB> I think that the aux vector must be naturally aligned. You can return >> KB> ENOEXEC early if vptr is not aligned. >> >> Not sure I see what you mean. vptr for auxv is calculated just couple lines >> above, and I check the result here, in the part common for all vector types. KB> You do not check for the alignment. Am I wrong ? I see now. If natural alignment means "addr % sizeof(aux) == 0" then the aux vectors are not naturally aligned. After adding this check: if (vptr % sizeof(aux) != 0) return (ENOEXEC); I started to observe many ENOEXEC errors. Adding printf showed that the vectors are half size aligned. On i386: get_proc_vector(pid = getty[3442], type = 2): vptr (2143284876) % sizeof(aux) (8) = 4) On amd64: get_proc_vector(pid = getty[2425], type = 2): vptr (140737488346568) % sizeof(aux) (16) = 8) Looking at exec_copyout_strings() from kern_exec.c, how destp is calculated, I think they are sizeof(char *) aligned. Do you think it is worth adding the check for sizeof(char *) alignment? if (vptr % (sizeof(char *) != 0) return (ENOEXEC); -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sat, 5 Nov 2011 15:58:01 +0200 Kostik Belousov wrote: KB> +if (error == EFAULT) { KB> +for (i = 0; i < len; i++) { KB> +c = fubyte(sptr + i); KB> +if (c < 0) KB> As a purely stylistical issue, compare with -1. KB> +return (EFAULT); KB> +buf[i] = (char)c; KB> +if (c == '\0') KB> +break; KB> +} KB> +error = 0; KB> +} KB> +return error; KB> Put () around error. Thanks. KB> +/* KB> + * Check that that the address is in user space. KB> + */ KB> +if (vptr + 1 < VM_MIN_ADDRESS + 1 || vptr >= VM_MAXUSER_ADDRESS) KB> +return (ENOEXEC); KB> Why is this needed ? I saw this check in libkvm for ps_argvstr and ps_envstr and decided to add it too. Just some additional check that ps_string fields, which can be overwritten by the user, look reasonable. If you think this is not very useful I remove it. KB> I think that the aux vector must be naturally aligned. You can return KB> ENOEXEC early if vptr is not aligned. Not sure I see what you mean. vptr for auxv is calculated just couple lines above, and I check the result here, in the part common for all vector types. BTW, investigating the cases when I got procstat: sysctl: kern.proc.args: 58002: 8: Exec format error they were because the PROC_VECTOR_MAX limit (512 entries, as it is in linprocfs and libkvm) is small for real world cases: get_proc_vector(pid = rm[47883], type = 0): vsize (3009) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[47883], type = 0): vsize (3009) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[47890], type = 0): vsize (3008) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[47890], type = 0): vsize (3008) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[47897], type = 0): vsize (4511) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[47897], type = 0): vsize (4511) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[47897], type = 0): vsize (4511) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[48044], type = 0): vsize (611) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[52189], type = 0): vsize (772) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[52192], type = 0): vsize (1157) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[55685], type = 0): vsize (1041) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[55687], type = 0): vsize (1040) > PROC_VECTOR_MAX (512)) get_proc_vector(pid = rm[55690], type = 0): vsize (1559) > PROC_VECTOR_MAX (512)) So I am going to change it to ARG_MAX and use independent limit (256 entries) for auxv. KB> Why the blank after the loop statement in get_ps_strings() ? Sorry, what blank you mean? I don't see it in get_ps_strings(). May be you mean the blank line in get_proc_vector() before return? KB> There shall be blank lines after the '{' in proc_getargv() and proc_getenvv(). Ah, sure. KB> Note that using cached pargs is somewhat inconsistent with the digging KB> into ps_strings. KB> procfs_doproccmdline() can benefit from your work. Thanks, I will look at it :-). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote: KB> I suspect this is my bug: Reading the GET_PS_STRINGS_CHUNK_SZ may validly KB> return EFAULT if the string is shorter than the chunk and aligned at KB> the end of the page, assuming the next page is not mapped. There should KB> be a fallback to fubyte() read loop. I remember that copyinstr() was KB> unsuitable. Hm, I thought that this issue was only for reading arg and env strings (which could be shorter than GET_PS_STRINGS_CHUNK_SZ), but investigating the cases when EFAULT was returned in my tests (running buildworld and procstat in loop) I saw that it also returned when reading other objects (like struct ps_strings), and a fallback to fubyte() read loop was successful in those cases too. So I updated the patch to do fallback for any type of read (although it does not contain a good comment explaining why fubyte() read might succeed when proc_rwmem() failed). Also there were the cases when EFAULT was returned because arg vector contained the NULL pointer. I observed this for sh processes. In lib/libc/gen/setproctitle.c I found this comment: oargc = ps_strings->ps_nargvstr; oargv = ps_strings->ps_argvstr; for (i = len = 0; i < oargc; i++) { /* * The program may have scribbled into its * argv array, e.g., to remove some arguments. * If that has happened, break out before * trying to call strlen on a NULL pointer. */ if (oargv[i] == NULL) { oargc = i; break; } I have updated my patch to do the same. Running buildworld test after these changes I have observed EFAULT only once, for cc process, when argv contained a pointer to 0x40. Also, for kern.proc.args some times errors like below are observed: procstat: sysctl: kern.proc.args: 58002: 8: Exec format error And for kern.proc.env: procstat: sysctl: kern.proc.env: 81352: 16: Device busy But I have not investigated these cases yet. The update version: http://people.freebsd.org/~trociny/env.sys.2.patch -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote: KB> I think it is better to use sys/elf.h over the machine/elf.h. KB> Please change the comment for PROC_AUXV_MAX to "Safety limit on auxv size". KB> Also, it worth adding a comment saying that we are reading aux vectors twice, KB> first to get a size, second time to fetch a content, for simplicity. KB> When reading aux vector, if the PROC_AUXV_MAX entries are iterated over, KB> and we still not reached AT_NULL, the return error is 0. Was it intended ? KB> For PROC_ARG and PROC_ENV, you blindly trust the read values of the arg and KB> env vector sizes. This can easily cause kernel panics due to unability to KB> malloc the requested memory. I recommend to put some clump, and twice KB> of (PATH_MAX + ARG_MAX) is probably enough (see kern_exec.c, in particular, KB> exec_alloc_args). Also, you might use the swappable memory for the strings KB> as well, in the style of exec_alloc_args(). KB> I suspect this is my bug: Reading the GET_PS_STRINGS_CHUNK_SZ may validly KB> return EFAULT if the string is shorter than the chunk and aligned at KB> the end of the page, assuming the next page is not mapped. There should KB> be a fallback to fubyte() read loop. I remember that copyinstr() was KB> unsuitable. KB> The checks for P_WEXIT in the linprocfs routines look strange. Since KB> you are unlocking the process right after the check, it does not make KB> sense. In fact, the checks are not needed, I believe, since pseudofs KB> already did the hold (see e.g. pfs_read and pfs_visible). Here is an updated version of the patch. Also available at http://people.freebsd.org/~trociny/env.sys.1.patch I decided to use the same constant (PROC_VECTOR_MAX) for limiting both the number of arg or env strings and the numbex of aux vectors. Also I decided not to play with exec_alloc_args :-). -- Mikolaj Golub diff --git a/sys/sys/proc.h b/sys/sys/proc.h index fb97913..4949f98 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -168,6 +168,7 @@ struct p_sched; struct proc; struct procdesc; struct racct; +struct sbuf; struct sleepqueue; struct td_sched; struct thread; @@ -843,6 +844,10 @@ int p_canwait(struct thread *td, struct proc *p); struct pargs *pargs_alloc(int len); void pargs_drop(struct pargs *pa); void pargs_hold(struct pargs *pa); +int proc_getargv(struct thread *td, struct proc *p, struct sbuf *sb, + size_t nchr); +int proc_getenvv(struct thread *td, struct proc *p, struct sbuf *sb, + size_t nchr); void procinit(void); void proc_linkup0(struct proc *p, struct thread *td); void proc_linkup(struct proc *p, struct thread *td); diff --git a/sys/sys/sysctl.h b/sys/sys/sysctl.h index 1e879f5..99ea342 100644 --- a/sys/sys/sysctl.h +++ b/sys/sys/sysctl.h @@ -559,6 +559,8 @@ SYSCTL_ALLOWED_TYPES(UINT64, uint64_t *a; unsigned long long *b; ); #define KERN_PROC_VMMAP 32 /* VM map entries for process */ #define KERN_PROC_FILEDESC 33 /* File descriptors for process */ #define KERN_PROC_GROUPS 34 /* process groups */ +#define KERN_PROC_ENV 35 /* get environment */ +#define KERN_PROC_AUXV 36 /* get ELF auxiliary vector */ /* * KERN_IPC identifiers diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c index 998e7ca..cc7c746 100644 --- a/sys/kern/kern_proc.c +++ b/sys/kern/kern_proc.c @@ -41,6 +41,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include #include #include #include @@ -49,6 +51,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -75,6 +78,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #ifdef COMPAT_FREEBSD32 @@ -1356,6 +1360,290 @@ pargs_drop(struct pargs *pa) pargs_free(pa); } +static int +proc_read_mem(struct thread *td, struct proc *p, vm_offset_t offset, void* buf, +size_t len) +{ + struct iovec iov; + struct uio uio; + + iov.iov_base = (caddr_t)buf; + iov.iov_len = len; + uio.uio_iov = &iov; + uio.uio_iovcnt = 1; + uio.uio_offset = offset; + uio.uio_resid = (ssize_t)len; + uio.uio_segflg = UIO_SYSSPACE; + uio.uio_rw = UIO_READ; + uio.uio_td = td; + + return (proc_rwmem(p, &uio)); +} + +static int +proc_read_string(struct thread *td, struct proc *p, const char *sptr, char *buf, +size_t len) +{ + size_t i; + int error, c; + + error = proc_read_mem(td, p, (vm_offset_t)sptr, buf, len); + /* + * Reading the chunk may validly return EFAULT if the string is shorter + * than the chunk and is aligned at the end of the page, assuming the + * next page is not mapped. So if EFAULT is returned do a fallback to + * fubyte() read loop. + */ + if (error == EFAULT) { + for (i = 0; i < len; i++) { + c = fubyte(sptr + i); + if (c < 0) +return (EFAULT); + buf[i] = (char)c; + if (c == '\0') +break; + } + error = 0; + } + return error; +
Re: "ps -e" without procfs(5)
On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote: KB> For PROC_ARG and PROC_ENV, you blindly trust the read values of the arg and KB> env vector sizes. This can easily cause kernel panics due to unability to KB> malloc the requested memory. I recommend to put some clump, and twice KB> of (PATH_MAX + ARG_MAX) is probably enough (see kern_exec.c, in particular, KB> exec_alloc_args). Also, you might use the swappable memory for the strings KB> as well, in the style of exec_alloc_args(). After looking at it more closely, I am not sure if I need to use exec_alloc_args. I malloc explicitly only for array vector (proc_vector). And actually it should be much smaller than 2 * (PATH_MAX + ARG_MAX). Currently in linprocfs the limit is 512 entries: #define MAX_ARGV_STR512 /* Max number of argv-like strings */ The same limit is in libkvm: /* * Check that there aren't an unreasonable number of arguments, * and that the address is in user space. Special test for * VM_MIN_ADDRESS as it evaluates to zero, but is not a simple zero * constant for some archs. We cannot use the pre-processor here and * for some archs the compiler would trigger a signedness warning. */ if (narg > 512 || addr + 1 < VM_MIN_ADDRESS + 1 || addr >= VM_MAXUSER_ADDRESS) return (0); (BTW, may be the VM_MIN_ADDRESS - VM_MAXUSER_ADDRESS is worth adding in my code too?) So it looks like I should use the same limit (512 * sizeof(char *)) for the allocated array. I could use exec_alloc_args() for the allocation but it would reqire some changes: I would have to free using kmem_free_wakeup(), which requires size of the region, while I return the number of entries. So I'd rather not use exec_alloc_args() for vector allocation because the benefit is not significant here. For strings I use sbuf and set it up using sbuf_new_for_sysctl. I could set it up manually as SBUF_FIXEDLEN allocating buf (up to 2 * (PATH_MAX + ARG_MAX)) with exec_alloc_args() but this would complicate things a little. Do you think it is worth doing? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote: KB> On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote: >> >> What do you think about the attached patch? This is a kernel >> part. COMPAT_FREEBSD32 has not been tested after the last update (just >> checked >> that it compiles): it looks I will not have access to amd64 box for testing >> during the weekend. I will test it after the weekend. >> >> Both kernel and userland parts are available here: >> >> http://people.freebsd.org/~trociny/env.sys.patch >> http://people.freebsd.org/~trociny/env.user.patch >> >> Currently there is an issue with procstat -x: if one tried to run it on 64 >> bit >> for a 32 bit process it would not detect this so would output a garbage. >> Could >> somebody recommend a way how to get this info about a process from userlend? KB> I think it is better to use sys/elf.h over the machine/elf.h. KB> Please change the comment for PROC_AUXV_MAX to "Safety limit on auxv size". KB> Also, it worth adding a comment saying that we are reading aux vectors twice, KB> first to get a size, second time to fetch a content, for simplicity. KB> When reading aux vector, if the PROC_AUXV_MAX entries are iterated over, KB> and we still not reached AT_NULL, the return error is 0. Was it intended ? According to kern_exec.c it is possible that a process doesn't have auxv at all. I don't know a way how to detect this. So because PROC_AUXV_MAX is much larger than expected amount of aux entries and we have not reached AT_NULL it is most likely the process doesn't have auxv and 0 length array (without error) is returned. If you think I should return a error in this situation, I can add this. Please tell me the error code I should return :-). Also, may be there is a sane way to check on auxv existence? KB> For PROC_ARG and PROC_ENV, you blindly trust the read values of the arg and KB> env vector sizes. This can easily cause kernel panics due to unability to KB> malloc the requested memory. I recommend to put some clump, and twice KB> of (PATH_MAX + ARG_MAX) is probably enough (see kern_exec.c, in particular, KB> exec_alloc_args). Also, you might use the swappable memory for the strings KB> as well, in the style of exec_alloc_args(). KB> I suspect this is my bug: Reading the GET_PS_STRINGS_CHUNK_SZ may validly KB> return EFAULT if the string is shorter than the chunk and aligned at KB> the end of the page, assuming the next page is not mapped. There should KB> be a fallback to fubyte() read loop. I remember that copyinstr() was KB> unsuitable. KB> The checks for P_WEXIT in the linprocfs routines look strange. Since KB> you are unlocking the process right after the check, it does not make KB> sense. In fact, the checks are not needed, I believe, since pseudofs KB> already did the hold (see e.g. pfs_read and pfs_visible). Ah, right. Unintentionally added when was adding the P_SYSTEM check. Thank you for all your comments. I will do this. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Tue, 25 Oct 2011 11:24:51 +0300 Kostik Belousov wrote: KB> On Tue, Oct 25, 2011 at 12:13:10AM +0300, Mikolaj Golub wrote: >> >> On Sun, 16 Oct 2011 20:10:05 +0300 Kostik Belousov wrote: >> >> KB> In my opinion, the way to implement the feature is to (re)use >> KB> linprocfs_doargv() and provide another kern.proc sysctl to retrieve the >> KB> argv and env vectors. Then, ps(1) and procstat(1) can use it, as well >> as >> KB> procfs and linprocfs inside the kernel. >> >> Thanks! I am testing a patch (without auxv vector so far) and have some >> questions. >> >> Original ps -e returns environment only for user owned processes (the >> access is >> restricted by the permissions of /proc/pid/mem file). My kern.proc.env >> sysctl >> does not have such a restriction. I suppose I should add it? What function I >> could use for this? >> >> BTW, linprocfs allows to read other user's environment. KB> linprocfs uses p_cansee() to check the permissions. There are sysctls KB> security.bsd.see_other_{ug}ids that control the behaviour. KB> I believe that the new sysctl shall use the same check. >> >> KB> While you are at the code, it would be useful to also export the auxv >> vector, >> KB> which is immediately before env. >> >> It looks I can find the location of auxv but what about the size? Or do you >> propose to extend struct ps_strings to store location and size of auxv? I >> could do this way... KB> No, extending ps_strings is not needed and it is too radical change. KB> The auxv vector must end by the AT_NULL aux entry. You can also artificially KB> limit the amount of read aux vectors to, say, 256, which is much more then KB> it is currently defined. What do you think about the attached patch? This is a kernel part. COMPAT_FREEBSD32 has not been tested after the last update (just checked that it compiles): it looks I will not have access to amd64 box for testing during the weekend. I will test it after the weekend. Both kernel and userland parts are available here: http://people.freebsd.org/~trociny/env.sys.patch http://people.freebsd.org/~trociny/env.user.patch Currently there is an issue with procstat -x: if one tried to run it on 64 bit for a 32 bit process it would not detect this so would output a garbage. Could somebody recommend a way how to get this info about a process from userlend? -- Mikolaj Golub diff --git a/sys/sys/proc.h b/sys/sys/proc.h index fb97913..4949f98 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -168,6 +168,7 @@ struct p_sched; struct proc; struct procdesc; struct racct; +struct sbuf; struct sleepqueue; struct td_sched; struct thread; @@ -843,6 +844,10 @@ int p_canwait(struct thread *td, struct proc *p); struct pargs *pargs_alloc(int len); void pargs_drop(struct pargs *pa); void pargs_hold(struct pargs *pa); +int proc_getargv(struct thread *td, struct proc *p, struct sbuf *sb, + size_t nchr); +int proc_getenvv(struct thread *td, struct proc *p, struct sbuf *sb, + size_t nchr); void procinit(void); void proc_linkup0(struct proc *p, struct thread *td); void proc_linkup(struct proc *p, struct thread *td); diff --git a/sys/sys/sysctl.h b/sys/sys/sysctl.h index 1e879f5..99ea342 100644 --- a/sys/sys/sysctl.h +++ b/sys/sys/sysctl.h @@ -559,6 +559,8 @@ SYSCTL_ALLOWED_TYPES(UINT64, uint64_t *a; unsigned long long *b; ); #define KERN_PROC_VMMAP 32 /* VM map entries for process */ #define KERN_PROC_FILEDESC 33 /* File descriptors for process */ #define KERN_PROC_GROUPS 34 /* process groups */ +#define KERN_PROC_ENV 35 /* get environment */ +#define KERN_PROC_AUXV 36 /* get ELF auxiliary vector */ /* * KERN_IPC identifiers diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c index 998e7ca..ef4055a 100644 --- a/sys/kern/kern_proc.c +++ b/sys/kern/kern_proc.c @@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$"); #include #include +#include #include #include #include @@ -49,6 +50,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -66,6 +68,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include + #ifdef DDB #include #endif @@ -1356,6 +1360,218 @@ pargs_drop(struct pargs *pa) pargs_free(pa); } +static int +proc_read_mem(struct thread *td, struct proc *p, vm_offset_t offset, void* buf, +size_t len) +{ + struct iovec iov; + struct uio uio; + + iov.iov_base = (caddr_t)buf; + iov.iov_len = len; + uio.uio_iov = &iov; + uio.uio_iovcnt = 1; + uio.uio_offset = offset; + uio.uio_resid = (ssize_t)len; + uio.uio_segflg = UIO_SYSSPACE; + uio.uio_rw = UIO_READ; + uio.uio_td = td; + + return (proc_rwmem(p, &uio)); +} + +#define PROC_AUXV_MAX 256 /* Limit on auxv size. */ + +enum proc_v
Re: "ps -e" without procfs(5)
On Tue, 25 Oct 2011 11:24:51 +0300 Kostik Belousov wrote: KB> On Tue, Oct 25, 2011 at 12:13:10AM +0300, Mikolaj Golub wrote: >> >> On Sun, 16 Oct 2011 20:10:05 +0300 Kostik Belousov wrote: >> >> KB> In my opinion, the way to implement the feature is to (re)use >> KB> linprocfs_doargv() and provide another kern.proc sysctl to retrieve the >> KB> argv and env vectors. Then, ps(1) and procstat(1) can use it, as well >> as >> KB> procfs and linprocfs inside the kernel. >> >> Thanks! I am testing a patch (without auxv vector so far) and have some >> questions. >> >> Original ps -e returns environment only for user owned processes (the >> access is >> restricted by the permissions of /proc/pid/mem file). My kern.proc.env >> sysctl >> does not have such a restriction. I suppose I should add it? What function I >> could use for this? >> >> BTW, linprocfs allows to read other user's environment. KB> linprocfs uses p_cansee() to check the permissions. There are sysctls KB> security.bsd.see_other_{ug}ids that control the behaviour. KB> I believe that the new sysctl shall use the same check. This looks reasonable for me. But I just wanted to be sure that this would be ok for other people, as my patch changes the system behavior: currently with security.bsd.see_other_{ug}ids and procfs (not linprocfs) mounted a user can see other users args but not env; after the change a user will see both args and env (until security.bsd.see_other_{ug}ids is off). >> >> KB> While you are at the code, it would be useful to also export the auxv >> vector, >> KB> which is immediately before env. >> >> It looks I can find the location of auxv but what about the size? Or do you >> propose to extend struct ps_strings to store location and size of auxv? I >> could do this way... KB> No, extending ps_strings is not needed and it is too radical change. KB> The auxv vector must end by the AT_NULL aux entry. You can also artificially KB> limit the amount of read aux vectors to, say, 256, which is much more then KB> it is currently defined. Thanks. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: "ps -e" without procfs(5)
On Sun, 16 Oct 2011 20:10:05 +0300 Kostik Belousov wrote: KB> In my opinion, the way to implement the feature is to (re)use KB> linprocfs_doargv() and provide another kern.proc sysctl to retrieve the KB> argv and env vectors. Then, ps(1) and procstat(1) can use it, as well as KB> procfs and linprocfs inside the kernel. Thanks! I am testing a patch (without auxv vector so far) and have some questions. Original ps -e returns environment only for user owned processes (the access is restricted by the permissions of /proc/pid/mem file). My kern.proc.env sysctl does not have such a restriction. I suppose I should add it? What function I could use for this? BTW, linprocfs allows to read other user's environment. KB> While you are at the code, it would be useful to also export the auxv vector, KB> which is immediately before env. It looks I can find the location of auxv but what about the size? Or do you propose to extend struct ps_strings to store location and size of auxv? I could do this way... -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
"ps -e" without procfs(5)
Hi, I have a patch that makes kvm_uread() read from user space using ptrace(2). http://people.freebsd.org/~trociny/kvm_uread.ptrace.patch With this change 'ps -e' does not requires procfs(5). Do you like it or there might be some reasons why it is a bad idea? Grepping sources it looks like currently only ps uses kvm_getenvv(3) (and thus kvm_uread()). Note, when reading from its own user space it just does bcopy(3), so if a wrong address range is passed to kvm_uread() the program will segfault. Do we need some protection here and what? Masking SIGSEGV? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: issues with kern.devstat.all
On Tue, 22 Mar 2011 00:15:06 +0300 Andrey Zonov wrote: AZ> Hi, AZ> This sysctl contains a binary data. You can see it using -o or -x AZ> sysctl's key. AZ> Additional information is at devstat(3) manpage. Or try sysutils/devstat :-) -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: flowtable_cleaner/flowtable_flush livelock
On Sat, 20 Nov 2010 20:04:35 + (UTC) Bjoern A. Zeeb wrote: BAZ> How do you reproduce the crash? Is it just another ifioctl race as BAZ> from kern/146250? Using the same script I posted in my first mail, removing a jail and epair interface simultaneously: ifconfig epair0b vnet myjail jail -r myjail & ifconfig epair0a destroy For me it loooks like other thread is destroying interface improperly in that time. One time I saw crash in another thread: (kgdb) bt #0 doadump () at pcpu.h:231 #1 0xc04f2439 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056689728, dummy4=0xc2ba5984 "") at /usr/src/sys/ddb/db_command.c:548 #2 0xc04f2831 in db_command (last_cmdp=0xc0e75cfc, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 #3 0xc04f298a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xc04f48ad in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229 #5 0xc090face in kdb_trap (type=12, code=0, tf=0xc2ba5bf8) at /usr/src/sys/kern/subr_kdb.c:546 #6 0xc0c3d2bf in trap_fatal (frame=0xc2ba5bf8, eva=3735929066) at /usr/src/sys/i386/i386/trap.c:971 #7 0xc0c3d4f0 in trap_pfault (frame=0xc2ba5bf8, usermode=0, eva=3735929066) at /usr/src/sys/i386/i386/trap.c:893 #8 0xc0c3dca5 in trap (frame=0xc2ba5bf8) at /usr/src/sys/i386/i386/trap.c:568 #9 0xc0c24a9c in calltrap () at /usr/src/sys/i386/i386/exception.s:168 #10 0xc09ad219 in vnet_destroy (vnet=0xc2f24240) at /usr/src/sys/net/vnet.c:284 #11 0xc08b5922 in prison_deref (pr=0xc3640800, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_jail.c:2506 #12 0xc08b5ab0 in prison_complete (context=0xc3640800, pending=1) at /usr/src/sys/kern/kern_jail.c:2433 #13 0xc091c87b in taskqueue_run_locked (queue=0xc2dd6d80) at /usr/src/sys/kern/subr_taskqueue.c:247 #14 0xc091cf17 in taskqueue_thread_loop (arg=0xc0ebb8e8) at /usr/src/sys/kern/subr_taskqueue.c:379 #15 0xc08af558 in fork_exit (callout=0xc091ceb0 , arg=0xc0ebb8e8, frame=0xc2ba5d28) at /usr/src/sys/kern/kern_fork.c:835 #16 0xc0c24b44 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 (kgdb) fr 10 #10 0xc09ad219 in vnet_destroy (vnet=0xc2f24240) at /usr/src/sys/net/vnet.c:284 284 TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) { (kgdb) list 279 VNET_LIST_WUNLOCK(); 280 281 CURVNET_SET_QUIET(vnet); 282 283 /* Return all inherited interfaces to their parent vnets. */ 284 TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) { 285 if (ifp->if_home_vnet != ifp->if_vnet) 286 if_vmove(ifp, ifp->if_home_vnet); 287 } 288 (kgdb) p ifp $1 = (struct ifnet *) 0xdeadc0de Doesn't this need some lock protection? I tried the attached patch, but still observed crashes in ifioctl I posted earlier. -- Mikolaj Golub Index: sys/net/vnet.c === --- sys/net/vnet.c (revision 215576) +++ sys/net/vnet.c (working copy) @@ -268,7 +268,7 @@ vnet_alloc(void) void vnet_destroy(struct vnet *vnet) { - struct ifnet *ifp, *nifp; + struct ifnet *ifp; SDT_PROBE2(vnet, functions, vnet_destroy, entry, __LINE__, vnet); KASSERT(vnet->vnet_sockcnt == 0, @@ -281,10 +281,20 @@ vnet_destroy(struct vnet *vnet) CURVNET_SET_QUIET(vnet); /* Return all inherited interfaces to their parent vnets. */ - TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) { - if (ifp->if_home_vnet != ifp->if_vnet) + do { + IFNET_RLOCK(); + TAILQ_FOREACH(ifp, &V_ifnet, if_link) { + if (ifp->if_home_vnet != ifp->if_vnet) { +if_ref(ifp); +break; + } + } + IFNET_RUNLOCK(); + if (ifp != NULL) { if_vmove(ifp, ifp->if_home_vnet); - } + if_rele(ifp); + } + } while (ifp != NULL); vnet_sysuninit(); CURVNET_RESTORE(); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: flowtable_cleaner/flowtable_flush livelock
On Sat, 20 Nov 2010 17:03:13 + (UTC) Bjoern A. Zeeb wrote: BAZ> I think net@ would have been a better initial place but since this BAZ> seems to be a problem when interacting with VIMAGE BAZ> freebsd-virtualization might be better. BAZ> What you could try is: BAZ> http://people.freebsd.org/~bz/20100216-10-ft-cv.diff Ah, I have recalled I had already saw this patch but did not understand what the problem was that it fixed, thus did not associated it with my case (actually, I thought you had committed all these patches to the tree long time ago and I was running the kernel with them already :-). BTW, the patch needs updating: in the current flow_full() wakes up flowcleaner too, and flowcleaner sleeps for flowclean_freq instead of 10*hz (see the attached patch). With the patch I can't reproduce the lock. Only the crash I mentioned in my first letter is observed: (kgdb) bt #0 doadump () at pcpu.h:231 #1 0xc04f2789 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056677760, dummy4=0xc8731860 "") at /usr/src/sys/ddb/db_command.c:548 #2 0xc04f2b81 in db_command (last_cmdp=0xc0e79f7c, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 #3 0xc04f2cda in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xc04f4bfd in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229 #5 0xc09119be in kdb_trap (type=12, code=0, tf=0xc8731a94) at /usr/src/sys/kern/subr_kdb.c:546 #6 0xc0c3da8f in trap_fatal (frame=0xc8731a94, eva=3735929074) at /usr/src/sys/i386/i386/trap.c:970 #7 0xc0c3e0be in trap (frame=0xc8731a94) at /usr/src/sys/i386/i386/trap.c:361 #8 0xc0c272dc in calltrap () at /usr/src/sys/i386/i386/exception.s:168 #9 0xc0988415 in strncmp (s1=0xc1fee4e0 "epair20b", s2=0xdeadc0f2 , n=16) at /usr/src/sys/libkern/strncmp.c:44 #10 0xc09929d7 in ifunit_ref (name=0xc1fee4e0 "epair20b") at /usr/src/sys/net/if.c:1986 #11 0xc0996982 in ifioctl (so=0xc25649c0, cmd=3223349536, data=0xc1fee4e0 "epair20b", td=0xc286c000) at /usr/src/sys/net/if.c:2475 #12 0xc09307f7 in soo_ioctl (fp=0xc1ff5af0, cmd=3223349536, data=0xc1fee4e0, active_cred=0xc1d83e80, td=0xc286c000) at /usr/src/sys/kern/sys_socket.c:212 #13 0xc092a61d in kern_ioctl (td=0xc286c000, fd=3, com=3223349536, data=0xc1fee4e0 "epair20b") at file.h:254 #14 0xc092a7a4 in ioctl (td=0xc286c000, uap=0xc8731cec) at /usr/src/sys/kern/sys_generic.c:679 #15 0xc091f303 in syscallenter (td=0xc286c000, sa=0xc8731ce4) at /usr/src/sys/kern/subr_trap.c:318 #16 0xc0c3dd2f in syscall (frame=0xc8731d28) at /usr/src/sys/i386/i386/trap.c:1094 #17 0xc0c27371 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266 #18 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 10 #10 0xc09929d7 in ifunit_ref (name=0xc1fee4e0 "epair20b") at /usr/src/sys/net/if.c:1986 1986if (strncmp(name, ifp->if_xname, IFNAMSIZ) == 0 && (kgdb) p ifp $1 = (struct ifnet *) 0xdeadc0de I might want to report it to freebsd-virtualization unless I find that this is a known issue. -- Mikolaj Golub Index: sys/net/flowtable.c === --- sys/net/flowtable.c (revision 215574) +++ sys/net/flowtable.c (working copy) @@ -195,7 +195,8 @@ STATIC_VNET_DEFINE(uma_zone_t, flow_ipv6_zone); #define V_flow_ipv6_zone VNET(flow_ipv6_zone) -static struct cv flowclean_cv; +static struct cv flowclean_f_cv; +static struct cv flowclean_c_cv; static struct mtx flowclean_lock; static uint32_t flowclean_cycles; static uint32_t flowclean_freq; @@ -951,7 +952,7 @@ flow_full(struct flowtable *ft) if ((ft->ft_flags & FL_HASH_ALL) == 0) ft->ft_udp_idle = ft->ft_fin_wait_idle = ft->ft_syn_idle = ft->ft_tcp_idle = 5; - cv_broadcast(&flowclean_cv); + cv_broadcast(&flowclean_c_cv); } else if (!full && ft->ft_full) { flowclean_freq = 20*hz; if ((ft->ft_flags & FL_HASH_ALL) == 0) @@ -1560,14 +1561,14 @@ flowtable_cleaner(void) } VNET_LIST_RUNLOCK(); - flowclean_cycles++; /* * The 10 second interval between cleaning checks * is arbitrary */ mtx_lock(&flowclean_lock); - cv_broadcast(&flowclean_cv); - cv_timedwait(&flowclean_cv, &flowclean_lock, flowclean_freq); + flowclean_cycles++; + cv_broadcast(&flowclean_f_cv); + cv_timedwait(&flowclean_c_cv, &flowclean_lock, 10*hz); mtx_unlock(&flowclean_lock); } } @@ -1580,8 +1581,8 @@ flowtable_flush(void *unused __unused) mtx_lock(&flowclean_lock); start = flowclean_cycles; while (start == flowclean_cycles) { - cv_broadcast(&flowclean_cv); - cv_wait(&flowclean_cv, &flowclean_lock); + cv_broadcast(&flowclean_c_cv); + cv_wait(&flowclean_f_cv, &flowclean_lock); } mtx_unlock(&flowclean_lock); } @@ -1613,7 +1614,8 @@ static vo
Re: flowtable_cleaner/flowtable_flush livelock
On Sat, 20 Nov 2010 17:03:13 + (UTC) Bjoern A. Zeeb wrote: BAZ> On Sat, 20 Nov 2010, Mikolaj Golub wrote: BAZ> Hi, >> Running something like below under VirtualBox (CURRENT, VIMAGE) BAZ> ... >> So the question is who is guilty in this situation? ULE? flowtable? Or >> jail/epair, which should not allow simultaneous entering of flowtable_flush? BAZ> In general: you for running an experimental feature;-) I like experimenting :-) BAZ> What you could try is: BAZ> http://people.freebsd.org/~bz/20100216-10-ft-cv.diff I will. Thanks. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
flowtable_cleaner/flowtable_flush livelock
0004,33b,c1fd47f8,...) at flowtable_cleaner+0x255 fork_exit(c0990630,0,c43c7d28) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 In net/flowtable.c we have two functions: static void flowtable_cleaner(void) { ... while (1) { ... flowclean_cycles++; mtx_lock(&flowclean_lock); cv_broadcast(&flowclean_cv); cv_timedwait(&flowclean_cv, &flowclean_lock, flowclean_freq); mtx_unlock(&flowclean_lock); } } static void flowtable_flush(void *unused __unused) { uint64_t start; mtx_lock(&flowclean_lock); start = flowclean_cycles; while (start == flowclean_cycles) { cv_broadcast(&flowclean_cv); cv_wait(&flowclean_cv, &flowclean_lock); } mtx_unlock(&flowclean_lock); } It looks like when two threads enter flowtable_flush() simultaneously they start to wake up each other not giving to flowcleaner thread (which is in RUNQ) a chance to run (I suppose because it has higher priority number) and update flowclean_cycles counter. I added print in flowtable_flush() loop to check my assumption and got: flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23 flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23 flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23 flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23 flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23 flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23 flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23 flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23 ... So the question is who is guilty in this situation? ULE? flowtable? Or jail/epair, which should not allow simultaneous entering of flowtable_flush? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: close() failing with ECONNRESET
On Wed, 9 Jun 2010 12:45:52 +0300 Kostik Belousov wrote: KB> On Wed, Jun 09, 2010 at 12:35:11AM -0700, per...@pluto.rain.com wrote: >> Timo Sirainen wrote: >> >> > I see that since FreeBSD 6.3 close() can fail with: >> > >> > > [ECONNRESET]The underlying object was a stream socket that was >> > > shut down by the peer before all pending data was >> > > delivered. >> > >> > Could someone explain what this is useful for? KB> Note that any return from close(2) that does not set errno to EBADF KB> closes the supplied file descriptor. Mentioned errno value supplies KB> caller with the information that not "all pending data was delivered". We have kern/146845 about close(2) returning ECONNRESET for tcp connections. Looking at the code (which I am not very familiar with though) and running some tests make me think that currently ECONNRESET may be only returned by close(2) after shutdown()/close() on our side and simultaneous close() on the other side (and in this case this is wrong). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: leak of the vnodes
On Sat, 3 Apr 2010 19:52:38 +0300 Kostik Belousov wrote: > Then, after you determined the problematic mp, reboot the machine, > redo the procedure causing leak. From ddb prompt, you can do "show mount", > find the mp, then do "show mount ". The later command shall > produce really large output, listing all mp vnodes, so serial console > or firewire can be useful. Put output somewhere. Or use ddb capture buffer :-). In ddb: capture on capture off continue And then ddb capture print > capture.out Make sure your capture buffer is large enough (I have in my /etc/sysctl.conf debug.ddb.capture.bufsize=5242880). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: unix socket: race on close?
On Thu, 18 Feb 2010 11:59:40 + (GMT) Robert Watson wrote: > On Thu, 18 Feb 2010, Mikolaj Golub wrote: > >> Below is a simple test code with unix sockets: the client does >> connect()/close() in loop and the server -- accept()/close(). >> >> Sometimes close() fails with 'Socket is not connected' error: > > Hi Mikolaj: > > Thanks for this report, and sorry about not spotting your earlier post > to freebsd-net. I've been fairly preoccupied the last month and not > keeping up with the mailing lists. Could I ask you to file a PR on > this, and forward me the PR number so I can claim ownership? This > should prevent it from getting lost while I catch up. kern/144061 > In short, your evaluation seems reasonable to me -- have you tried > tweaking soclose() to ignore ENOTCONN from sodisconnect() to confirm > this diagnosis fixes all the instances you've been seeing? I just have done this: 1) add logging the error when sodisconnect() returns error: --- uipc_socket.c.orig 2010-02-18 14:25:25.0 +0200 +++ uipc_socket.c 2010-02-18 14:55:26.0 +0200 @@ -120,6 +120,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so #include #include #include +#include #include #include #include @@ -136,6 +137,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so #include + #ifdef COMPAT_IA32 #include #include @@ -657,7 +659,7 @@ soclose(struct socket *so) if ((so->so_state & SS_ISDISCONNECTING) == 0) { error = sodisconnect(so); if (error) - goto drop; + log(LOG_INFO, "soclose: sodisconnect error: %d\n", error); } if (so->so_options & SO_LINGER) { if ((so->so_state & SS_ISDISCONNECTING) && Then on every error exit of the test application, like this a.out: parent: close error: 57 I have in the message log: Feb 18 15:35:32 zhuzha kernel: soclose: sodisconnect error: 57 2) add logging the error when sodisconnect() returns error and ignore the error: --- uipc_socket.c.orig 2010-02-18 14:25:25.0 +0200 +++ uipc_socket.c 2010-02-18 15:41:07.0 +0200 @@ -120,6 +120,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so #include #include #include +#include #include #include #include @@ -136,6 +137,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so #include + #ifdef COMPAT_IA32 #include #include @@ -656,8 +658,11 @@ soclose(struct socket *so) if (so->so_state & SS_ISCONNECTED) { if ((so->so_state & SS_ISDISCONNECTING) == 0) { error = sodisconnect(so); - if (error) - goto drop; + if (error) { + log(LOG_INFO, "soclose: sodisconnect error: %d\n", error); + if (error == ENOTCONN) + error = 0; + } } if (so->so_options & SO_LINGER) { if ((so->so_state & SS_ISDISCONNECTING) && After this the test application does not exits and I see in the message log: Feb 18 16:02:37 zhuzha kernel: soclose: sodisconnect error: 57 Feb 18 16:03:31 zhuzha kernel: soclose: sodisconnect error: 57 Feb 18 16:05:49 zhuzha last message repeated 4 times Feb 18 16:15:50 zhuzha last message repeated 13 times -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
unix socket: race on close?
Hi, Below is a simple test code with unix sockets: the client does connect()/close() in loop and the server -- accept()/close(). Sometimes close() fails with 'Socket is not connected' error: a.out: parent: close error: 57 or a.out: child: close error: 57 It looks for me like some race in close(). Looking at uipc_socket.c:soclose(): int soclose(struct socket *so) { int error = 0; KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter")); CURVNET_SET(so->so_vnet); funsetown(&so->so_sigio); if (so->so_state & SS_ISCONNECTED) { if ((so->so_state & SS_ISDISCONNECTING) == 0) { error = sodisconnect(so); if (error) goto drop; } Isn't the problem here? so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING without locking and then sodisconnect() is called, which closes both sockets of the connection. So it looks for me that if the close() is called for both ends simultaneously it is possible that sodisconnect() will be called for both ends and for one ENOTCONN will be returned. Or may I have missed something? We have been observing periodically ENOTCONN errors on unix socket close in our applications, so it is not just curiosity :-) (I posted about our problem to freebsd-net@ some time ago but then did not attract any attention http://lists.freebsd.org/pipermail/freebsd-net/2009-December/024047.html). #include #include #include #include #include #include #include #include #include #include #include #include #include #define UNIXSTR_PATH "/tmp/mytest.socket" #define USLEEP 100 int main(int argc, char **argv) { int listenfd, connfd, pid; struct sockaddr_un servaddr; pid = fork(); if (-1 == pid) errx(1, "fork(): %d", errno); if (0 != pid) { /* parent */ if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0) errx(1, "parent: socket error: %d", errno); unlink(UNIXSTR_PATH); bzero(&servaddr, sizeof(servaddr)); servaddr.sun_family = AF_LOCAL; strcpy(servaddr.sun_path, UNIXSTR_PATH); if (bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0) errx(1, "parent: bind error: %d", errno); if (listen(listenfd, 1024) < 0) errx(1, "parent: listen error: %d", errno); for ( ; ; ) { if ((connfd = accept(listenfd, (struct sockaddr *) NULL, NULL)) < 0) errx(1, "parent: accept error: %d", errno); //usleep(USLEEP / 2); // (I) uncomment this or (II) below to avoid the race if (close(connfd) < 0) errx(1, "parent: close error: %d", errno); } } else { /* child */ sleep(1); /* give the parent some time to create the socket */ for ( ; ; ) { if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0) errx(1, "child: socket error: %d", errno); bzero(&servaddr, sizeof(servaddr)); servaddr.sun_family = AF_LOCAL; strcpy(servaddr.sun_path, UNIXSTR_PATH); if (connect(connfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0) errx(1, "child: connect error %d", errno); // usleep(USLEEP); // (II) uncomment this or (I) above to avoid the race if (close(connfd) != 0) errx(1, "child: close error: %d", errno); usleep(USLEEP); } } return 0; } -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: crashtar
On Tue, 13 Oct 2009 22:50:44 +0200 (CEST) Alexander Best wrote: AB> again: great script. would be great to have this in the ports dir in the near AB> future. I have created separate google project for this script http://code.google.com/p/bsdcrashtar/ And submitted to ports http://www.freebsd.org/cgi/query-pr.cgi?pr=139721 BTW, many things in the script have been improved since I posted it here, and user friendly error output is among them :-). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: crashtar
On Sat, 10 Oct 2009 12:34:05 +0200 (CEST) Alexander Best wrote: AB> thanks. this is a cool script and very useful indeed. only thing you might AB> want to do is check for root privileges at the beginning to avoid nasty error AB> messages like. AB> awk: can't open file /var/crash/info.0 AB> source line number 12 In some cases you might not need root privileges. E.g. on some servers I don't have root but SA gives me read access to crashdumps. In this case if the script had a check for root privileges I would not be able to use it. Actually as for me the message looks informative enough, it says that we have some problems with accessing crash dump files, so permissions should be checked. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: crashinfo: print the content of ddb capture budder
On Fri, 9 Oct 2009 11:28:11 -0400 John Baldwin wrote: JB> On Monday 05 October 2009 1:48:06 am Mikolaj Golub wrote: >> Hi, >> >> It would be nice if crashinfo(8) were also trying to output the content of >> ddb >> capture buffer. Something like in this patch: >> >> --- crashinfo.sh.orig2009-10-05 08:26:26.0 +0300 >> +++ crashinfo.sh2009-10-05 08:43:56.0 +0300 >> @@ -304,3 +304,18 @@ >> echo "kernel config" >> echo >> config -x $KERNEL >> + >> +file=`mktemp /tmp/crashinfo.XX` >> +if [ $? -eq 0 ]; then >> +ddb capture -M $VMCORE -N $KERNEL print > $file 2>/dev/null >> +if [ -s $file ]; then >> +echo >> "" >> +echo "ddb capture buffer" >> +echo >> +cat $file | >> +sed -e 's/p\{10\}p*//' # XXX: this removes the unfilled >> part of a capture buffer >> +echo >> +fi >> +rm -f $file >> +fi >> + >> JB> I'm definitely in favor of this. I assume you have tested it locally? Do you have a sample JB> crash.X.txt file with it enabled? I have tested it on 8.0. zhuzha:~% ls -l /var/crash/vmcore.23 -rw--- 1 root wheel 166703104 2009-10-05 08:03 /var/crash/vmcore.23 zhuzha:~% sudo crashinfo Writing crash summary to /var/crash/core.txt.23. zhuzha:~% grep -B5 -A30 'ddb capture buffer' /var/crash/core.txt.23 kernel config config: File /boot/kernel.old/kernel doesn't contain configuration file. Either unsupported, or not compiled with INCLUDE_CONFIG_FILE ddb capture buffer db:0:kdb.enter.panic> show pcpu cpuid= 0 dynamic pcpu= 0x68ee80 curthread= 0xc4a1ad80: pid 2276 "sysctl" curpcb = 0xe6d44d90 fpcurthread = none idlethread = 0xc4576900: pid 11 "idle: cpu0" APIC ID = 0 currentldt = 0x50 spin locks held: db:0:kdb.enter.panic> show allpcpu Current CPU: 0 cpuid= 0 dynamic pcpu= 0x68ee80 curthread= 0xc4a1ad80: pid 2276 "sysctl" curpcb = 0xe6d44d90 fpcurthread = none idlethread = 0xc4576900: pid 11 "idle: cpu0" APIC ID = 0 currentldt = 0x50 spin locks held: cpuid= 1 dynamic pcpu= 0x34ffe80 curthread= 0xc5837480: pid 2191 "screen" curpcb = 0xe6e5ed90 fpcurthread = none idlethread = 0xc4576b40: pid 11 "idle: cpu1" zhuzha:~% tail /var/crash/core.txt.23 mi_switch(104,0,c0c798d3,1d6,44,...) at mi_switch+0x200 sleepq_switch(c0dc8190,0,c0c798d3,26e,0,...) at sleepq_switch+0x15f sleepq_timedwait(c0dc7ee0,44,c0c7793c,0,0,...) at sleepq_timedwait+0x6b _sleep(c0dc7ee0,0,44,c0c7793c,2710,...) at _sleep+0x339 scheduler(0,141ec00,141ec00,141e000,1425000,...) at scheduler+0x23e mi_startup() at mi_startup+0x96 begin() at begin+0x2c db:0:kdb.enter.panic> call doadump zhuzha:~% Actually the last echo in the patch looks like is not necessary. Do you want the whole crash.23.txt file for review? Also, I remember I tested it on crashdump of a kernel without ddb support and no issues were noticed too. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
crashinfo: print the content of ddb capture budder
Hi, It would be nice if crashinfo(8) were also trying to output the content of ddb capture buffer. Something like in this patch: --- crashinfo.sh.orig 2009-10-05 08:26:26.0 +0300 +++ crashinfo.sh2009-10-05 08:43:56.0 +0300 @@ -304,3 +304,18 @@ echo "kernel config" echo config -x $KERNEL + +file=`mktemp /tmp/crashinfo.XX` +if [ $? -eq 0 ]; then + ddb capture -M $VMCORE -N $KERNEL print > $file 2>/dev/null + if [ -s $file ]; then + echo "" + echo "ddb capture buffer" + echo + cat $file | + sed -e 's/p\{10\}p*//' # XXX: this removes the unfilled part of a capture buffer + echo + fi + rm -f $file +fi + -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
crashtar
Hi, http://trociny.googlecode.com/files/crashtar This simple script is useful for me and might be useful for other people too. The script creates tar archive that contains all files needed for debugging FreeBSD kernel crash (vmcore, kernel, loaded modules, sources that appear in backtrace). This is useful e.g. for debugging a crash on another host, sending it to developers or if you are going to upgrade the kernel on crashed host but would like to keep crashdump in case the developers ask you to provide additional info. Created tar contains also a script that when being run inside unpacked archive will give kgdb(1) session with crash core loaded in it. The script should be run with root privileges because it does chroot(8) before starting kgdb(1). I think I don't have to warn here that a crashdump may be sent only to person you trust :-). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
7.1 panicked removing namecache entry from cache
e_next = 0xcc30acf0, tqe_prev = 0xc0c4f3ac}, v_bufobj = { bo_mtx = 0xc89053c4, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xc8905400}, bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xc8905410}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xc0be8e40, bo_bsize = 16384, bo_object = 0x0, bo_synclist = {le_next = 0x0, le_prev = 0xc890520c}, bo_private = 0xc890533c, __bo_vnode = 0xc890533c}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0} (kgdb) fr 7 #7 0xc07fd34b in cache_zap (ncp=0xcc33783c) at /usr/src/sys/kern/vfs_cache.c:276 276 LIST_REMOVE(ncp, nc_hash); (kgdb) p *ncp $2 = {nc_hash = {le_next = 0x0, le_prev = 0x0}, nc_src = {le_next = 0xcc31650c, le_prev = 0xcc2c51e4}, nc_dst = {tqe_next = 0x0, tqe_prev = 0x0}, nc_dvp = 0x0, nc_vp = 0x0, nc_flag = 0 '\0', nc_nlen = 0 '\0', nc_name = 0xcc33785e ""} -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Partial kvm dumps
.g. SNMP looks like more proper alternative solution -- this is standard, also snmpd is actually that program which "traverse kernel structures extracting all necessary data". But SNMP has its own limitations, statistics provided via SNMP are rather limited and currently I don't see how I could use it effectively to echieve my goal, althogh I haven't think much in this direction yet... -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: possibly tmpfs bug
On Fri, 3 Jul 2009 16:07:56 +0200 (CEST) Wojciech Puchar wrote: WP> repeatable WP> put something on tmpfs filesystem, then download it to other machine WP> using ftp (server is ftpd on first machine). no errors, download is WP> fine, but you get rubbish - simply data from wrong places in memory. WP> using rcp works. most probably ftpd uses sendfile, while rcp does not Yes, this is sendfile problem. It has been reported. http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/127213 -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: 7.1-STABLE crash
On Tue, 02 Jun 2009 23:24:54 +0300 Mikolaj Golub wrote: > Actually some output from crashinfo looks suspicious. zero values for fork() > calls, negative values in vmstat -m output... Does the userland where you were > running crashinfo matched the crushed kernel? It looks like the strange figures in vmstat output because its '-M' option is broken. I am observing the same issue on my system. I have rebuild the kernel and the world of today's RELENG_7 to make sure that this is not the issue with my kernel not in sync with the userland. zhuzha:~% uname -a FreeBSD zhuzha.ua1 7.2-STABLE FreeBSD 7.2-STABLE #18: Wed Jun 3 14:28:49 EEST 2009 r...@zhuzha.ua1:/usr/obj/usr/src/sys/DEBUG i386 zhuzha:~% vmstat -m |head Type InUse MemUse HighUse Requests Size(s) ata_dma 3 1K -3 128 GEOM83 9K - 1176 16,32,64,128,256,512,1024,2048 isadev20 2K - 20 64 ar_driver 0 0K -6 512,2048 acd_driver 1 2K -1 2048 cdev27 4K - 28 128 kbdmux 6 9K -6 16,128,256,2048,4096 sigio 8 1K -8 32 filedesc 15246K - 1637 16,32,64,128,256,512,1024,2048,4096 zhuzha:~% sudo vmstat -m -M /dev/mem -N /boot/kernel/kernel |head Type InUse MemUse HighUse Requests Size(s) ata_dma-3 0K -0 GEOM 1124760 -1107K - 1125936 16,32,64,128,256,512,1024,2048,4096,8192 isadev -20 0K -0 ar_driver 19450 -18K -19456 32,64 acd_driver-1-1K -0 cdev 100-2K - 128 16 kbdmux-6-7K -0 sigio-8 0K -0 filedesc 384793 -422K - 386432 16,32,64,128,1024,2048,4096,8192 zhuzha:~% vmstat -s |head 2380238 cpu context switches 27952 device interrupts 573159 software interrupts 931165 traps 9997656 system calls 56 kernel threads created 1519 fork() calls 45 vfork() calls 0 rfork() calls 0 swap pager pageins zhuzha:~% sudo vmstat -s -M /dev/mem -N /boot/kernel/kernel |head 0 cpu context switches 0 device interrupts 0 software interrupts 0 traps 0 system calls 0 kernel threads created 0 fork() calls 0 vfork() calls 0 rfork() calls 0 swap pager pageins On CURRENT vmstat -m works ok but vmstat -s gives zero values: fbsd# uname -a FreeBSD fbsd.zhuzha.ua1 8.0-CURRENT FreeBSD 8.0-CURRENT #8 r193242M: Mon Jun 1 23:43:06 EEST 2009 r...@zhuzha.ua1:/home/golub/freebsd/build/obj/home/golub/freebsd/src/sys/GENERIC i386 fbsd# vmstat -m |head Type InUse MemUse HighUse Requests Size(s) cdev11 2K - 11 128 CAM dev queue 1 1K -1 64 sigio 1 1K -1 32 filedesc4913K - 1109 16,256,512 kenv 106 8K - 109 16,32,64,128,4096 kqueue 0 0K - 12 128,1024 proc-args29 2K - 593 16,32,64,128 ithread63 6K - 63 16,64,128 acpica 42623K - 8140 16,32,64,128,256,512,1024 fbsd# vmstat -m -M /dev/mem -N /boot/kernel/kernel |head Type InUse MemUse HighUse Requests Size(s) cdev11 2K - 11 128 CAM dev queue 1 1K -1 64 sigio 1 1K -1 32 filedesc4913K - 16,256,512 kenv 106 8K - 109 16,32,64,128,4096 kqueue 0 0K - 12 128,1024 proc-args29 2K - 599 16,32,64,128 ithread63 6K - 63 16,64,128 acpica 42623K - 8140 16,32,64,128,256,512,1024 fbsd# vmstat -s |head 88380 cpu context switches 5058 device interrupts 16851 software interrupts 138908 traps 140549 system calls 18 kernel threads created 1070 fork() calls 22 vfork() calls 0 rfork() calls 0 swap pager pageins fbsd# vmstat -s -M /dev/mem -N /boot/kernel/kernel |head 0 cpu context switches 0 device interrupts 0 software interrupts 0 traps 0 system calls 0 kernel threads created 0 fork() calls 0 vfork() calls 0 rfork() calls 0 swap pager pageins On 6.3-RELEASE vmstat -s gives zero values only for some counters: fbsd6# uname -a FreeBSD fbsd6.zhuzha.ua1 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 04:18:52 UTC 2008 r...@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 fbsd6# vmstat -m |head Type InUse MemUse
Re: 7.1-STABLE crash
On Tue, 02 Jun 2009 13:41:40 +0400 Asmodean Dark wrote: AD> # kgdb kernel.debug vmcore.0 AD> GNU gdb 6.1.1 [FreeBSD] AD> Copyright 2004 Free Software Foundation, Inc. AD> GDB is free software, covered by the GNU General Public License, and you are AD> welcome to change it and/or distribute copies of it under certain conditions. AD> Type "show copying" to see the conditions. AD> There is absolutely no warranty for GDB. Type "show warranty" for details. AD> This GDB was configured as "i386-marcel-freebsd"...No struct type named linker_file. AD> No struct type named linker_file. AD> No struct type named linker_file. AD> Attempt to extract a component of a value that is not a structure. AD> No struct type named linker_file. AD> No struct type named linker_file. AD> No struct type named linker_file. AD> Attempt to extract a component of a value that is not a structure. AD> Attempt to extract a component of a value that is not a structure pointer. AD> Attempt to extract a component of a value that is not a structure pointer. AD> Attempt to extract a component of a value that is not a structure pointer. AD> Attempt to extract a component of a value that is not a structure pointer. AD> #0 0x8063d6b0 in doadump () AD> (kgdb) bt AD> #0 0x8063d6b0 in doadump () AD> #1 0x8063dc44 in boot () AD> #2 0x8063e0ca in panic () AD> #3 0x807dab3d in trap_fatal () AD> #4 0x807daeba in trap_pfault () AD> #5 0x807db7bd in trap () AD> #6 0x807c2a3b in calltrap () AD> #7 0x806dcb88 in rn_match () AD> #8 0x806ddc8a in rn_lookup () AD> #9 0x8070e460 in ipfw_chk (args=0xe70175fc) at ../../../netinet/ip_fw2.c:1894 AD> #10 0x80710c3d in ipfw_check_in (arg=0x0, m0=0xe7017700, ifp=0x91c5a800, dir=1, inp=0x0) at ../../../netinet/ip_fw_pfil.c:125 AD> #11 0x806dc20f in pfil_run_hooks () AD> #12 0x80713984 in ip_input (m=0x91954c00) at ../../../netinet/ip_input.c:416 AD> #13 0x806ec0d9 in ng_iface_rcvdata () AD> #14 0x806e9570 in ng_apply_item () AD> #15 0x806e8569 in ng_snd_item () AD> #16 0x806e9570 in ng_apply_item () AD> #17 0x806e8569 in ng_snd_item () AD> #18 0x806e9570 in ng_apply_item () AD> #19 0x806e8569 in ng_snd_item () AD> #20 0x806f16a7 in ng_ppp_proto_recv () AD> #21 0x806f3ed2 in ng_ppp_rcvdata () AD> #22 0x806e9570 in ng_apply_item () AD> #23 0x806e8569 in ng_snd_item () AD> #24 0x806e9570 in ng_apply_item () AD> #25 0x806e8569 in ng_snd_item () AD> #26 0x806ee3c3 in ng_ksocket_incoming2 () AD> #27 0x806e969d in ng_apply_item () AD> #28 0x806ea8aa in ngintr () AD> #29 0x806dab72 in swi_net () AD> #30 0x8061e265 in ithread_loop () AD> #31 0x8061adf5 in fork_exit () AD> #32 0x807c2ab0 in fork_trampoline () AD> (kgdb) fr 9 AD> #9 0x8070e460 in ipfw_chk (args=0xe70175fc) at ../../../netinet/ip_fw2.c:1894 AD> 1894sa.sin_len = 8; AD> (kgdb) list AD> 1889struct sockaddr_in sa; AD> 1890 AD> 1891if (tbl >= IPFW_TABLES_MAX) AD> 1892return (0); AD> 1893rnh = ch->tables[tbl]; AD> 1894sa.sin_len = 8; ^ looks strange. On the line 1894 I expected to see rnh_lookup() call, which is two lines below. Are you sure your source matches the built kernel? AD> 1895sa.sin_addr.s_addr = addr; AD> 1896ent = (struct table_entry *)(rnh->rnh_lookup(&sa, NULL, rnh)); AD> 1897if (ent != NULL) { AD> 1898*val = ent->value; AD> (kgdb) p *cmd AD> $1 = {opcode = O_IP_SRC_LOOKUP, len = 1 '\001', arg1 = 2} AD> (kgdb) p cmd->arg1 AD> $2 = 2 It crashed looking for src IP in table 2. But from ps otput I don't see the process that could modify the table in that time. So the table might have been corrupted earlier. Unfortunately, reviewing provided info I don't have any good ideas what might have caused this. May be other people on the list could help... Recently I saw some backtrace of the crash in rn_match() too but then it was pf that was looking for IP in the table. It appeared that the guy was running ssh brute-force blocker and expiretable, which was run periodically, removed old entries from the table. He just disabled expiretable and this stopped the crashes. Actually some output from crashinfo looks suspicious. zero values for fork() calls, negative values in vmstat -m output... Does the userland where you were running crashinfo matched the crushed kernel? Also, does you kernel match userland on crashed box? And certainly it would be good to provide backtrace with full debugging info available :-). Do you remember that debugging symbols for modules are needed too? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: 7.1-STABLE crash
On Tue, 02 Jun 2009 10:36:32 +0400 Asmodean Dark wrote: >>> Do you really need a such outdated version of mpd while mpd5 is available >>> in ports? > Yes, we need mpd3 because there are many our custom patches for it (some > RADIUS functions, bugfixes etc). > >>> Do you have some automatic blocker or some other script that periodically >>> add/remove IPs in a ipfw table? > Yes. For some our users we use ipfw-based rules (SMTP port blocking, enabling > transparent proxy and other). Mpd ifaceup script contain something like this: > if [ "$FilterId" = "no_transparent" ]; then > $IPFW table 3 add $IP > fi > > And, in ipfw.rules: > add allow tcp from table(3) to any dst-port 80 in recv ng* > > Why table can be corrupted? ipfw check added address, isn`t it? It might be that when ipfw was looking for IP in the table it was being modified by ifaceup script at that time. ipfw has lock protection so this thing should not have happened but... Could you run something like this? ps -auxl -M /path/to/vmcore -N /path/to/kernel.symbols We could look if at the moment of the crash some process was running that was adding/removing ipfw table. Other good and simple thing is to run crashinfo(8) utility and provide its output. If it runs flawlessly it should contain ps output too among other useful information. >>> Do the bt output for other crashes looks the same? > It`s a first dump obtained. There is some problems in receiving it, because > all servers is network-booted (via PXE) and have no usable dump devices :) > This dump obtained with USB flash device connected to server. Also, crashes > is not often. So we could try to get as much info from this dump as we can :-). Could you post here the whole output of kgdb from its start, not only bt (I mean things like "Reading symbols from..." and the command line itself)? Also you can try in kgdb session: fr 9 list p *cmd p cmd->arg1 p a -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: 7.1-STABLE crash
On Mon, 01 Jun 2009 10:05:46 +0400 Asmodean Dark wrote: AD> Hello, AD> We have cluster of FreeBSD VPN servers with running mpd3.18 and sometimes it crash: Do you really need a such outdated version of mpd while mpd5 is available in ports? AD> # uname -a AD> FreeBSD vpn 7.1-STABLE FreeBSD 7.1-STABLE #13: Wed Mar 18 14:53:13 YEKT 2009 r...@vpn:/usr/src/sys/i386/compile/kconf i386 AD> # dmesg AD> Fatal trap 12: page fault while in kernel mode AD> cpuid = 1; apic id = 01 AD> fault virtual address = 0x8 AD> fault code = supervisor read, page not present AD> instruction pointer = 0x20:0x806dcb88 AD> stack pointer = 0x28:0xe70775c4 AD> frame pointer = 0x28:0xe70775e8 AD> code segment= base 0x0, limit 0xf, type 0x1b AD> = DPL 0, pres 1, def32 1, gran 1 AD> processor eflags= interrupt enabled, resume, IOPL = 0 AD> current process = 22 (em0 taskq) AD> trap number = 12 AD> (kgdb) bt AD> #0 0x8063d6b0 in doadump () AD> #1 0x8063dc44 in boot () AD> #2 0x8063e0ca in panic () AD> #3 0x807dab3d in trap_fatal () AD> #4 0x807daeba in trap_pfault () AD> #5 0x807db7bd in trap () AD> #6 0x807c2a3b in calltrap () AD> #7 0x806dcb88 in rn_match () AD> #8 0x806ddc8a in rn_lookup () AD> #9 0x8070e460 in ipfw_chk (args=0xe70175fc) at ../../../netinet/ip_fw2.c:1894 AD> #10 0x80710c3d in ipfw_check_in (arg=0x0, m0=0xe7017700, ifp=0x91c5a800, dir=1, inp=0x0) at ../../../netinet/ip_fw_pfil.c:125 AD> #11 0x806dc20f in pfil_run_hooks () AD> #12 0x80713984 in ip_input (m=0x91954c00) at ../../../netinet/ip_input.c:416 AD> #13 0x806ec0d9 in ng_iface_rcvdata () AD> #14 0x806e9570 in ng_apply_item () AD> #15 0x806e8569 in ng_snd_item () AD> #16 0x806e9570 in ng_apply_item () AD> #17 0x806e8569 in ng_snd_item () AD> #18 0x806e9570 in ng_apply_item () AD> #19 0x806e8569 in ng_snd_item () AD> #20 0x806f16a7 in ng_ppp_proto_recv () AD> #21 0x806f3ed2 in ng_ppp_rcvdata () AD> #22 0x806e9570 in ng_apply_item () AD> #23 0x806e8569 in ng_snd_item () AD> #24 0x806e9570 in ng_apply_item () AD> #25 0x806e8569 in ng_snd_item () AD> #26 0x806ee3c3 in ng_ksocket_incoming2 () AD> #27 0x806e969d in ng_apply_item () AD> #28 0x806ea8aa in ngintr () AD> #29 0x806dab72 in swi_net () AD> #30 0x8061e265 in ithread_loop () AD> #31 0x8061adf5 in fork_exit () AD> #32 0x807c2ab0 in fork_trampoline () It looks like the kernel crashed when ipfw was looking for the packet's src/dst in a table. The table might have been corrupted (being modified?) at that time. Do you have some automatic blocker or some other script that periodically add/remove IPs in a ipfw table? AD> What can I do with it? Are additional info needed? It is a bit strange for me that in the bt output the file source information is displayed only for several functions. Do you have the kernel and all modules built with the debugging symbols? If you don't I would recommend to rebuild the kernel so I would be able to provide bt (from the next crash :-) with all necessary info available. Do the bt output for other crashes looks the same? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Memory leak on thread removal
On Sat, 16 May 2009 20:24:09 +0200 Marius Nünnerich wrote: >> http://freshmeat.net/projects/lmdbg >> >> This is a small memory leak debugger. It does not provide all functionality >> you can find in more sophisticated tools but is lightweight, portable and >> simple in use. It was very useful when I traced this bug. MN> Thanks, I'll take a look at it. Today I submitted lmdbg port. http://www.freebsd.org/cgi/query-pr.cgi?pr=134617 At present it is waiting to be committed in ports tree, but you can use shar from the PR to build the port yourself. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Memory leak on thread removal
On Tue, 12 May 2009 09:27:30 +0300 Mikolaj Golub wrote: MG> Hi, MG> The code below is compiled with -fopenmp and run on FreeBSD6/7 (i386, amd64): MG> #include MG> #include MG> int n = 4, m = 2; MG> int main () { MG> for (;;) { MG> int i; MG> //sleep(2); MG> #pragma omp parallel for num_threads(m) MG> for(i = 0; i < 1; i++) {} MG> //sleep(2); MG> #pragma omp parallel for num_threads(n) MG> for(i = 0; i < 1; i++) {} MG> MG> } MG> return 0; MG> } MG> During the run the program's virtual memory usage constantly grows. The growth MG> is observed only when n != m. When running the program with uncommented MG> sleep() and observing the number of threads with 'top -H' I see in turn 2 or 4 MG> threads. So it looks like memory leak when thread is removed. Should I fill MG> PR? Reported. http://www.freebsd.org/cgi/query-pr.cgi?pr=134604 -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Memory leak on thread removal
On Fri, 15 May 2009 13:48:51 +0200 Marius Nünnerich wrote: MN> On Tue, May 12, 2009 at 08:27, Mikolaj Golub wrote: >> Hi, >> >> The code below is compiled with -fopenmp and run on FreeBSD6/7 (i386, >> amd64): >> >> #include >> #include >> >> int n = 4, m = 2; >> >> int main () { >> for (;;) { >> int i; >> >> //sleep(2); >> #pragma omp parallel for num_threads(m) >> for(i = 0; i < 1; i++) {} >> >> //sleep(2); >> #pragma omp parallel for num_threads(n) >> for(i = 0; i < 1; i++) {} >> >> } >> >> return 0; >> } >> >> During the run the program's virtual memory usage constantly grows. The >> growth >> is observed only when n != m. When running the program with uncommented >> sleep() and observing the number of threads with 'top -H' I see in turn 2 >> or 4 >> threads. So it looks like memory leak when thread is removed. Should I fill >> PR? It looks like I have found the leak. The problem is in libgomp/team.c. gomp_thread_start() does sem_init() but sem_destroy() is never called. This patch solves the problem for me: --- contrib/gcclibs/libgomp/team.c.orig 2009-05-16 17:32:57.0 +0300 +++ contrib/gcclibs/libgomp/team.c 2009-05-16 19:16:37.0 +0300 @@ -164,9 +164,12 @@ new_team (unsigned nthreads, struct gomp static void free_team (struct gomp_team *team) { + int i; free (team->work_shares); gomp_mutex_destroy (&team->work_share_lock); gomp_barrier_destroy (&team->barrier); + for(i = 1; i < team->nthreads; i++) +gomp_sem_destroy (team->ordered_release[i]); gomp_sem_destroy (&team->master_release); free (team); } I am going to fill PR to gcc mainstream, but should I also register this in FreeBSD bugtrack as gcc is part of the base? BTW, the problem is not observed under Linux. I have not looked in Linux code but it looks like sem_init() implementation for Linux does not do memory allocation. The memory for the test program below grows under FreeBSD and does not under Linux. #include int main(int argc, char *argv[]) { sem_t sem; for(;;) { sem_init(&sem, 0, 0);} return 0; } MN> I can confirm this. I briefly looked through the libgomp code but MN> didn't see the leak. Anybody knows good tools how to investigate this? http://freshmeat.net/projects/lmdbg This is a small memory leak debugger. It does not provide all functionality you can find in more sophisticated tools but is lightweight, portable and simple in use. It was very useful when I traced this bug. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Memory leak on thread removal
Hi, The code below is compiled with -fopenmp and run on FreeBSD6/7 (i386, amd64): #include #include int n = 4, m = 2; int main () { for (;;) { int i; //sleep(2); #pragma omp parallel for num_threads(m) for(i = 0; i < 1; i++) {} //sleep(2); #pragma omp parallel for num_threads(n) for(i = 0; i < 1; i++) {} } return 0; } During the run the program's virtual memory usage constantly grows. The growth is observed only when n != m. When running the program with uncommented sleep() and observing the number of threads with 'top -H' I see in turn 2 or 4 threads. So it looks like memory leak when thread is removed. Should I fill PR? -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: diagnosing freezes (DRI?)
On Sat, 11 Apr 2009 11:15:59 +0100 xorquew...@googlemail.com wrote: x> On 2009-04-11 02:30:40, Paul B. Mahol wrote: >> >> If it locks under X11 then use debug.debugger_on_panic=0 sysctl. >> Not doing this will increase drasticaly chances of locking whole system >> and not providing any debug data. x> I don't seem to have that sysctl. You will see this sysctl only if you build your kernel with ddb(4) support. If you are interested in providing useful information about your freezes, please read the following: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html You need to build your kernel with options described in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-ddb.html and http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html As you run X, you need to have debug.debugger_on_panic=0 set (as Paul has suggested). Otherwise ddb would enter on panic but you wouldn't be able to access it due to X. After panic you will be able to get useful information from generated core dump using kgdb. Another option is to set debug.debugger_on_panic=1 but also set some ddb script that will run when the kernel debugger is entered as a result of a panic. This script will enable output capture, dump some useful debugging info to capture buffer, and then force a kernel dump to be written out followed by a reboot. E.g. running something like this will do the trick: ddb script 'kdb.enter.panic=capture on; show pcpu; show allpcpu; bt; ps; show locks; show alllocks; show lockedvnods; alltrace; call doadump; reset' After reboot you can extract captured information from the capture buffer information using the command: ddb capture -M /var/crash/vmcore.X print > ddb.out You need to increase the value of debug.ddb.capture.bufsize sysctl variable to make sure all ddb output will be captured. You can read more about this in ddb(4), ddb(8), textdump(4). -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
How to calculate current kmem usage?
Hi, Could someone explain please, how to calculate current kernel memory utilization, one that is limited by vm.kmem_size? There is a script on http://wiki.freebsd.org/ZFSTuningGuide that calculates kernel memory utilization by summing the values from `kldstat' output (TEXT) and the values from `vmstat -m' output (DATA). Are these the only data needed for proper calculation of kmem? What about zone(9) allocations? Shouldn't data from `vmstat -z' output be added to calculate kmem usage? The reason I am asking about this is that we are tuning vfs.ufs.dirhash_maxmem on our storage servers. By default it is 2Mb that looks like very small value. We increased it to 30Mb and all 30Mb were filled very quickly, so we are considering to increase it more but we need the method to monitor the system resources we can hit (we use the default value for vm.kmem_size 300Mb that is not so large). So what the system parameters we should monitor increasing vfs.ufs.dirhash_maxmem? I see the growth of dirhash_maxmem corresponds the growth of wired memory. Currently wired is 222M on this host. Isn't wired memory limited by vm.kmem_size or it is limited only by vm.kvm_size? BTW, how reasonably large the value of vfs.ufs.dirhash_maxmem can be? I have seen recommendations to increase it until it all in usage, but may be there are other considerations I should take into account? We use rsync on our storage servers to synchronize data between the hosts and I suppose this is the main dirhash_mem eater. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Socket leak
On Wed, 14 May 2008 09:46:35 -0400 Mark Saad wrote: MS> Mikolaj MS> Thanks for the input, did you change any of the options for MS> TimeoutLinger or TimeoutIdle ? No, I didn't MS> The Proftpd I am running is build for 6.3-RELEASE here are the build MS> options MS> Compile-time Settings: MS> Version: 1.3.0a MS> Platform: FREEBSD6 (FREEBSD6_3) MS> Built With: MS>configure CPPFLAGS=-DHAVE_OPENSSL --localstatedir=/var/run MS> --disable-sendfile --disable-ipv6 MS> --with-modules=mod_sql:mod_sql_mysql:mod_check_mysql:mod_check_digest MS> --prefix=/usr/local MS> --with-includes=/usr/local/include/mysql:/usr/include/openssl MS> --with-libraries=/usr/local/lib/mysql It might be that it is not proftpd but other application that cause the leak. Anyway, to check if it is proftpd, look in its logs for entries like these: Entering Passive Mode (192,168,0,213,241,70). FTP session closed. Convert the last two numbers to port (241*256+70) and check by netstat if you still have this connection. If you have, then it is likely this is the same situation as in my case and the proftpd is a problem. Upgrade to 1.3.1 from ports then. If proftpd is ok, look for other applications. Search for connections reported by netstat as ESTABLISHED but not displayed by sockstat utility. You could run something like this: netstat -an | grep ESTABL | while read b l a local remote state; do echo -n "$local $remote: " sockstat | sed -e 's/:/./g' | grep -c "$local *$remote" done Look for sockets with 0 count. These are suspicious ones. Observe these sockets by netstat and try to figure out what application they could belong and dig in that direction. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Socket leak
On Tue, 13 May 2008 19:37:29 -0400 Mark Saad wrote: MS> I started logging the values of kern.ipc.numopensockets and I noticed MS> that something is leaking sockets. Here is a sample of the log MS> 2008-04-29--15:04.10 kern.ipc.numopensockets: 1501 MS> 2008-04-29--16:04.01 kern.ipc.numopensockets: 1535 MS> 2008-04-29--17:04.00 kern.ipc.numopensockets: 1617 MS> 2008-04-29--18:04.00 kern.ipc.numopensockets: 1710 MS> This continues until kern.ipc.maxsockets its reached or the box is MS> rebooted. MS> The other thing we looked at was the output from vmstat -z MS> The first thing was the high amount of malloc 128 bucket failures MS> 128 Bucket:524,0, 2489, 80, 8364, 23055239 MS> I also logged the mbuf clusters, we never reached the max mbuf clusters MS> Its almost like there are stale sockets. Here is a snapshot of the server now MS> ewr# sockstat -4u |wc -l MS> 139 MS> ewr# sysctl kern.ipc.numopensockets MS> kern.ipc.numopensockets: 13935 MS> ewr# uptime MS> 7:30PM up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17 We had the same problem on one of hosts running 6.2-RELEASE-p11. The situation was complicated by the fact that I didn't have root access to the host and there were problems with getting more debugging or running tcpdump. Eventually, it appeared that problem was caused by proftpd. One of our clients connected to ftp server every five minutes looking for new file to download. When there was the file everything was good. But when there wasn't, some strange things happened. In proftpd logs we had: FTP session opened. mod_delay/0.5: delaying for 28 usecs user fake authenticated by mod_auth_pam.c USER fake: Login successful. Preparing to chroot to directory '/var/ftp/fake' Environment successfully chroot()ed. mod_delay/0.5: delaying for 621 usecs Entering Passive Mode (XX,YY,ZZ,213,241,70). FTP session closed. i.e. the client connected to server, had login successful, created new DATA connection in passive mode and then exited. But although proftpd reported that connection closed and proftpd process exited we still had this orphaned connection in our system reported by netstat in ESTABLISHED state. sockstat did not display this connections. Some of these connections could be in CLOSE_WAIT mode instead of ESTABLISHED. Such connection was seen by netstat for several hours and then disappeared but I suspect that the socket buffer was not freed and numopensockets counter did not decrease. Unfortunately, I did not managed to persuade admin to increase DebugLevel in proftpd.conf and run tcpdump to investigate more what was going on. It turned out that we had proftpd built for FREEBSD5_4: Compile-time Settings: Version: 1.3.0 Platform: FREEBSD5 (FREEBSD5_4) Built With: configure --localstatedir=/var/run --sysconfdir=/usr/local/etc --disable-sendfile --disable-ipv6 --with-modules=mod_ratio:mod_readme:mod_rewrite:mod_wrap:mod_ifsession --prefix=/usr/local i386-portbld-freebsd5.4 Upgrade to more recent proftpd built for proper platform resolved the problem. So I would recommend to look for process that could cause this leak. In my case careful investigation of netstat output history and comparing with sockstat output helped to find guilty. May be it would help to restart daemons one by one and see if sockets are freed. You can surely increase kern.ipc.maxsockets as workaround until you identify what causes the problem. -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"