kevent and unix dgram socket problem

2007-04-03 Thread Jason Carroll

Hi everyone--

I'm working on an application that is attempting to use kqueues to
detect data arriving at a unix domain datagram socket, but kevents
don't appear to get delivered when a datagram arrives.  Using poll()
for the same purpose appears to work fine.  Also, if I switch the
socket to the AF_INET domain, I see the correct behavior with
kevent().

I distilled the problem into two files that I included.  listen.cc
creates a unix socket and blocks for data on a kevent() call.
write.cc sends a brief message to the same unix socket.

I've seen the problem on 6-STABLE and 4.5-RELEASE.

Anyone have any thoughts or comments?

Thanks,
Jason

//  listen.cc =

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define LISTENQ 2
#define UN_PATH_LEN sizeof(((struct sockaddr_un*)0)->sun_path)

int main(int argc, char *argv[])
{
// new socket
int fd = socket(AF_LOCAL, SOCK_DGRAM, 0);
assert(fd >= 0);

// make sure there isn't something in it's way
unlink("usock");

// create the local address, bind & listen
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_LOCAL;
strncpy(addr.sun_path, "usock", UN_PATH_LEN - 1);
assert(bind(fd, (sockaddr*) &addr, sizeof(sockaddr_un)) == 0);
assert(listen(fd, LISTENQ) == 0);

char buf[1024];
int nread;

// uncomment this line to prove my socket is set up correctly
// nread = read(fd, buf, sizeof(buf));
// printf("read %d bytes\n", nread);

int kqueueFD;
kqueueFD = kqueue();
struct kevent event;

EV_SET(&event, fd, EVFILT_READ, EV_ADD, 0, 0, 0);
assert(kevent(kqueueFD, &event, 1, 0, 0, 0) == 0);


struct pollfd pfd;
pfd.fd = fd;
pfd.events = POLLIN;
pfd.revents = 0;

int r;

// uncomment the following two lines to see poll behavior
// while ((r = poll(&pfd, 1, INFTIM)) >= 0) {
// printf("poll returned %d\n", r);

// uncomment the following two lines to see kqueue behavior
while ((r = kevent(kqueueFD, 0, 0, &event, 1, 0)) >= 0) {
printf("kevent returned %d\n", r);

nread = read(fd, buf, sizeof(buf));
printf("read %d bytes\n", nread);
}
}

//  write.cc =

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define LISTENQ 2
#define UN_PATH_LEN sizeof(((struct sockaddr_un*)0)->sun_path)

int main(int argc, char *argv[])
{
int fd = socket(AF_LOCAL, SOCK_DGRAM, 0);
assert(fd >= 0);

// create the local address & "connect"
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_LOCAL;
strncpy(addr.sun_path, "usock", UN_PATH_LEN - 1);
assert(connect(fd, (sockaddr*) &addr, sizeof(sockaddr_un)) == 0);

const char *msg = "this is the message\n";
write(fd, msg, strlen(msg));
}
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


freebsd-5.4-stable panics

2005-09-23 Thread Jason Carroll
Hi--

I've been working on setting up a dual cpu, dual-core Opteron 275 with
freebsd-5.4-stable, but have been getting panics and reboots fairly
consistently.   I think the problem I'm seeing might be related to
this discussion:

http://groups.google.com/group/lucky.freebsd.current/browse_thread/thread/6abaddffadebfdfe/f251a4874c2be3b1?lnk=st&q=freebsd+kernel+%22trap+9%22+closef&rnum=3&hl=en#f251a4874c2be3b1

but I can't be sure.

I have several applications (on the order of 10) that each receive and
send multicast data (each listens to 6-12 multicast streams and
broadcasts 1).  They also log to disk the data they broadcast.  These
applications each join all the groups they listen to at startup, and
never explicitly leave these groups.  These applications process
500-5000 packets per second between them in our environment.  The
machine usually panics after these applications have been up and
running for 30 min to 6 hours.  Several times the panic/reboot seems
to have been triggered by an independent operation from these
applications (copying a large file off the machine or moving a
directory that contained the log files)

After the first few panics, we rebuilt the kernel with trace and debug
options and have saved a few core files.

There seem to be 2 types of crashes we see with pretty different stack
traces.  What I'll call a type 1 crash, I believe, is often caused by
one of the triggers I mention above.  A type 2 crash appears to happen
spontaneously after the machine has been running for a while.

I poked around using kgdb in a core file from a type 2 crash, and it
appeared the system hung closing sockets (specifically cleaning up
multicast state i think) while cleaning up one of our multicast
applications (note the trace through sys_exit).  There's no reason
this application should have been exiting unless it encountered some
kind of error.

I'm attaching:
dmesg.txt
kernel-conf.txt (kernel config file)
type1-core.txt (a kgdb bt from a type1/triggered crash)
type2-core.txt (a kgdb bt from a type2/spontaneous crash)

I'm happy to dig for more information, recompile with different
options, apply patches, or do anything else that might help get this
problem diagnosed and fixed!

Thanks,
Jason Carroll
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-STABLE #1: Wed Sep 21 16:25:57 EDT 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/LOCAL-DEBUG
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Dual Core AMD Opteron(tm) Processor 275 (2190.66-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f12  Stepping = 2
  
Features=0x178bfbff
  Features2=0x1
  AMD Features=0xe2500800,LM,3DNow+,3DNow>
  Hyperthreading: 2 logical CPUs
real memory  = 3942580224 (3759 MB)
avail memory = 3805609984 (3629 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-27 on motherboard
ioapic2  irqs 28-31 on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0:  on acpi0
acpi_throttle0:  on cpu0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  at device 6.0 on pci0
pci3:  on pcib1
ohci0:  mem 0xfeafc000-0xfeafcfff irq 19 at 
device 0.0 on pci3
usb0: OHCI version 1.0, legacy support
usb0:  on ohci0
usb0: USB revision 1.0
uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1:  mem 0xfeafd000-0xfeafdfff irq 19 at 
device 0.1 on pci3
usb1: OHCI version 1.0, legacy support
usb1:  on ohci1
usb1: USB revision 1.0
uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
pci3:  at device 6.0 (no driver attached)
fxp0:  port 0xbc00-0xbc3f mem 
0xfeaa-0xfeab,0xfeafb000-0xfeafbfff irq 18 at device 8.0 on pci3
miibus0:  on fxp0
inphy0:  on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:e0:81:31:89:1b
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 
0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0:  at device 7.2 (no driver attached)
pci0:  at device 7.3 (no driver attached)
pcib2:  at device 10.0 on pci0
pci2:  on pcib2
em0:  port 
0x8880-0x88bf mem 0xfc90-0xfc93,0xfc9c-0xfc9d irq 26 at device 
2.0 on pci2
em0: Ethernet address: 00:04:23:ba