Paul Alfille <paul.alfi...@gmail.com> writes: > Great, a nice test case. Should be fixable. I like the shell script approach.
... > The problem seems to be that owserver is relatively fine as long as > the requests come in an orderly fashion, such as in this script: ... > But if I change the script to requesting everything at the same time, > like this: > > #!/bin/sh > > while true; do > owget -s 3000 uncached/10.7F1914000800/temperature & > owget -s 3000 20.E0C700000000/volt.ALL & > owget -s 3000 uncached/1D.0A5201000000/counters.A & > owget -s 3000 10.7F1914000800/temperature & > owget -s 3000 26.22FB4F000000/temperature & > owget -s 3000 26.22FB4F000000/VDD & > sleep 5 > done > > then owserver breaks within seconds, owget processes pile up because > owserver doesn't respond for some files. Eventally owserver might crash. Anything I can do to debug this? It's easy to reproduce. Owserver seems pretty fragile, i.e. it'll crash under some load. I see a few instances of SIGPIPE signals, I'm not sure whether this is a problem or gdb catching what owserver otherwise will catch and handle. This might be caused by a client terminating prematurely. And I also get abort()s, which seems like something else. Here's some gdb output while running owserver: Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x458f3950 (LWP 32021)] 0x00007f36fdbe78e3 in writev () from /lib/libc.so.6 (gdb) bt #0 0x00007f36fdbe78e3 in writev () from /lib/libc.so.6 #1 0x0000000000402d27 in ToClient (file_descriptor=9, machine_order_cm=0x458f2ea0, data=0x0) at to_client.c:80 #2 0x000000000040503d in SingleHandler (hd=0x458f2f90) at handler.c:215 #3 0x0000000000404b2b in Handler (file_descriptor=9) at handler.c:119 #4 0x00007f36fe789faa in ServerProcessHandler (arg=0x60bd80) at ow_net_server.c:158 #5 0x00007f36fde7f3f7 in start_thread () from /lib/libpthread.so.0 #6 0x00007f36fdbeeb3d in clone () from /lib/libc.so.6 #7 0x0000000000000000 in ?? () (gdb) up #1 0x0000000000402d27 in ToClient (file_descriptor=9, machine_order_cm=0x458f2ea0, data=0x0) at to_client.c:80 80 ret = writev(file_descriptor, io, nio) != (ssize_t) (io[0].iov_len + io[1].iov_len); Similar: Writev byte buffer ionr=1/2 length=24 --000: 00 00 00 00 00 00 00 0C 00 00 00 0C 00 00 01 01 --016: 00 00 00 0C 00 00 00 00 <........................> Writev byte buffer ionr=2/2 length=12 --000: 20 20 20 20 20 20 20 20 34 2E 38 31 < 4.81> Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x43ec1950 (LWP 23028)] 0x00007f5cf2c008e3 in writev () from /lib/libc.so.6 (gdb) bt #0 0x00007f5cf2c008e3 in writev () from /lib/libc.so.6 #1 0x0000000000402d27 in ToClient (file_descriptor=10, machine_order_cm=0x43ec1050, data=0x60a9b0 " 4.81") at to_client.c:80 #2 0x00000000004046fb in DataHandler (v=0x47ec8f90) at data.c:180 #3 0x00007f5cf2e983f7 in start_thread () from /lib/libpthread.so.0 #4 0x00007f5cf2c07b3d in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () A crash: [New Thread 0x44210950 (LWP 27589)] CALL: data.c:DataHandler(125) Read message DEBUG: read.c:ReadHandler(55) ReadHandler: DEBUG: read.c:ReadHandler(61) ReadHandler: From Client sm->payload=25 sm->size=51 sm->offset=0 DEBUG: read.c:ReadHandler(81) ReadHandler: call FS_read_postparse on /20.E0C700000000/volt.ALL DEBUG: ow_read.c:FS_read_postparse(78) /20.E0C700000000/volt.ALL DEBUG: ow_read.c:FS_read_distribute(196) /20.E0C700000000/volt.ALL DEBUG: ow_read.c:FS_r_given_bus(229) start OWQ OneWireQuery structure of /20.E0C700000000/volt.ALL OneWireQuery size=51 offset=0, extension=-1 Byte buffer OneWireQuery buffer, length=51 --000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --048: 00 00 00 <...................................................> OneWireQuery I=6335920 U=6335920 F=3.13036E-317 Y=6335920 D=Sun Mar 15 08:58:40 1970 --- OneWireQuery done *** glibc detected *** /usr/local/bin/owserver: malloc(): memory corruption (fast): 0x000000000060c3a0 *** CALL: data.c:DataHandler(102) DataHandler: parse path=uncached/1D.0A5201000000/counters.A DEBUG: ow_parseobject.c:FS_OWQ_create(26) uncached/1D.0A5201000000/counters.A ======= Backtrace: ========= CALL: ow_parsename.c:FS_ParsedName_anywhere(115) path=[uncached/1D.0A5201000000/counters.A] DEBUG: ow_cache.c:Cache_Get_Device(856) 1D 0A 52 01 00 00 00 95 /lib/libc.so.6[0x7f9c544003b2] /lib/libc.so.6(__libc_malloc+0x90)[0x7f9c54401360] DEBUG: ow_cache.c:Cache_Get_Common(1086) Get from cache sn 1D 0A 52 01 00 00 00 95 pointer=0x7f9c55251cc0 index=0 size=4 /lib/libc.so.6(tsearch+0x10a)[0x7f9c544610ea] /usr/local/lib/libow-2.7.so.21(LockGet+0x160)[0x7f9c54ffb3a1] /usr/local/lib/libow-2.7.so.21[0x7f9c55007130] /usr/local/lib/libow-2.7.so.21(FS_read_distribute+0x212)[0x7f9c55006de7] /usr/local/lib/libow-2.7.so.21[0x7f9c5500671e] /usr/local/lib/libow-2.7.so.21(FS_read_postparse+0x2cc)[0x7f9c55006598] /usr/local/bin/owserver[0x4036cf] /usr/local/bin/owserver[0x4042dd] /lib/libpthread.so.0[0x7f9c546f33f7] DEBUG: ow_cache.c:Cache_Get_Common(1099) value found in cache. Remaining life: 115 seconds. DEBUG: ow_presence.c:CheckPresence(75) Found device on bus 0 DEBUG: ow_parsename.c:FS_ParsedName_anywhere(199) Dirlength=26 </uncached/1D.0A5201000000/counters.A> CALL: data.c:DataHandler(125) Read message DEBUG: read.c:ReadHandler(55) ReadHandler: DEBUG: read.c:ReadHandler(61) ReadHandler: From Client sm->payload=36 sm->size=12 sm->offset=0 DEBUG: read.c:ReadHandler(81) ReadHandler: call FS_read_postparse on /uncached/1D.0A5201000000/counters.A DEBUG: ow_read.c:FS_read_postparse(78) /uncached/1D.0A5201000000/counters.A DEBUG: ow_read.c:FS_read_distribute(196) /uncached/1D.0A5201000000/counters.A DEBUG: ow_read.c:FS_r_given_bus(229) start OWQ OneWireQuery structure of /uncached/1D.0A5201000000/counters.A OneWireQuery size=12 offset=0, extension=0 01000000/counters.A Program received signal SIGABRT, Aborted. [Switching to Thread 0x48218950 (LWP 27587)] 0x00007f9c543bd095 in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007f9c543bd095 in raise () from /lib/libc.so.6 #1 0x00007f9c543beaf0 in abort () from /lib/libc.so.6 #2 0x00007f9c543f7a7b in ?? () from /lib/libc.so.6 #3 0x00007f9c544003b2 in ?? () from /lib/libc.so.6 #4 0x00007f9c54401360 in malloc () from /lib/libc.so.6 #5 0x00007f9c544610ea in tsearch () from /lib/libc.so.6 #6 0x00007f9c54ffb3a1 in LockGet (pn=0x482180a8) at ow_locks.c:173 #7 0x00007f9c55007130 in FS_r_given_bus (owq=0x48218090) at ow_read.c:244 #8 0x00007f9c55006de7 in FS_read_distribute (owq=0x48218090) at ow_read.c:206 #9 0x00007f9c5500671e in FS_read_real (owq=0x48218090) at ow_read.c:109 #10 0x00007f9c55006598 in FS_read_postparse (owq=0x48218090) at ow_read.c:87 #11 0x00000000004036cf in ReadHandler (hd=0x46213f90, cm=0x48218050, owq=0x48218090) at read.c:86 #12 0x00000000004042dd in DataHandler (v=0x46213f90) at data.c:126 #13 0x00007f9c546f33f7 in start_thread () from /lib/libpthread.so.0 #14 0x00007f9c54462b3d in clone () from /lib/libc.so.6 #15 0x0000000000000000 in ?? () (gdb) up #1 0x00007f9c543beaf0 in abort () from /lib/libc.so.6 (gdb) up #2 0x00007f9c543f7a7b in ?? () from /lib/libc.so.6 (gdb) up #3 0x00007f9c544003b2 in ?? () from /lib/libc.so.6 (gdb) up #4 0x00007f9c54401360 in malloc () from /lib/libc.so.6 (gdb) up #5 0x00007f9c544610ea in tsearch () from /lib/libc.so.6 (gdb) up #6 0x00007f9c54ffb3a1 in LockGet (pn=0x482180a8) at ow_locks.c:173 173 if ((opaque = (struct dev_opaque *)tsearch(dlock, &(pn->selected_connection->dev_db), dev_compare)) == NULL) { // unfound and uncreatable (gdb) print dlock $1 = (struct devlock *) 0x60c240 (gdb) print *dlock $2 = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\0' <repeats 16 times>, "", '\0' <repeats 19 times>, __align = 0}, sn = " \000\000\000\000", users = 0} Here's the output from owserver when I do an owget which never finishes, but owserver keeps running: owget -s 3000 26.22FB4F000000/VDD DEBUG: ow_net_server.c:ServerProcessAccept(203) 3000[1082530128] accept 0 fd=17 DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(177) unlock 1082530128 DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(179) unlock 1082530128 done DEBUG: ow_net_server.c:ServerProcessAccept(216) 3000[1082530128] unlock 0 DEBUG: ow_tcp_read.c:tcp_read(60) attempt 24 bytes Time:(10,0) Byte buffer NETREAD, length=24 --000: 00 00 00 00 00 00 00 14 00 00 00 07 00 00 01 0A --016: 00 00 00 00 00 00 00 00 <........................> DEBUG: ow_tcp_read.c:tcp_read(108) n=24 nleft=0 n-nleft=24 DEBUG: from_client.c:FromClient(65) FromClient payload=20 size=0 type=7 sg=0x10A offset=0 DEBUG: from_client.c:FromClient(73) FromClient (no servermessage) payload=20 size=0 type=7 sg=0x10A offset=0 DEBUG: ow_tcp_read.c:tcp_read(60) attempt 20 bytes Time:(10,0) Byte buffer NETREAD, length=20 --000: 32 36 2E 32 32 46 42 34 46 30 30 30 30 30 30 2F --016: 56 44 44 00 <26.22FB4F000000/VDD.> DEBUG: ow_tcp_read.c:tcp_read(108) n=20 nleft=0 n-nleft=20 CALL: data.c:DataHandler(102) DataHandler: parse path=26.22FB4F000000/VDD DEBUG: ow_parseobject.c:FS_OWQ_create(26) 26.22FB4F000000/VDD CALL: ow_parsename.c:FS_ParsedName_anywhere(115) path=[26.22FB4F000000/VDD] DEBUG: ow_cache.c:Cache_Get_Device(856) 26 22 FB 4F 00 00 00 15 DEBUG: ow_cache.c:Cache_Get_Common(1086) Get from cache sn 26 22 FB 4F 00 00 00 15 pointer=0x7f855d33c420 index=0 size=4 DEBUG: ow_cache.c:Cache_Get_Common(1099) value found in cache. Remaining life: 57 seconds. DEBUG: ow_presence.c:CheckPresence(75) Found device on bus 0 DEBUG: ow_parsename.c:FS_ParsedName_anywhere(199) Dirlength=17 </26.22FB4F000000/VDD> CALL: data.c:DataHandler(145) Directory message (all at once) DEBUG: dirall.c:DirallHandler(66) OWSERVER Dir-All SpecifiedBus=0 path = /26.22FB4F000000/VDD DEBUG: ow_dir.c:FS_dir_remote(74) path=/26.22FB4F000000/VDD CALL: ow_dir.c:FS_dir_both(102) path=/26.22FB4F000000/VDD DEBUG: ow_dir.c:FS_dir_both(186) ret=-20 DEBUG: ow_parseobject.c:FS_OWQ_destroy(141) /26.22FB4F000000/VDD DEBUG: ow_parsename.c:FS_ParsedName_destroy(64) /26.22FB4F000000/VDD DEBUG: data.c:DataHandler(162) DataHandler: FS_ParsedName_destroy done DEBUG: data.c:DataHandler(176) DataHandler: cm.ret=-20 DEBUG: to_client.c:ToClient(60) payload=0 size=0, ret=-20, sg=0x10A offset=0 Writev byte buffer ionr=1/1 length=24 --000: 00 00 00 00 00 00 00 00 FF FF FF EC 00 00 01 0A --016: 00 00 00 00 00 00 00 00 <........................> DEBUG: data.c:DataHandler(193) DataHandler: done DEBUG: ow_net_server.c:ServerProcessAccept(243) 1082530128 CLOSING DEBUG: ow_net_server.c:ServerProcessAccept(189) 3000[1082530128] try lock 0 DEBUG: ow_net_server.c:ServerProcessAccept(195) 3000[1082530128] locked 0 DEBUG: ow_net_server.c:ServerProcessAccept(203) 3000[1082530128] accept 0 fd=5 DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(177) unlock 1082530128 DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(179) unlock 1082530128 done DEBUG: ow_net_server.c:ServerProcessAccept(216) 3000[1082530128] unlock 0 DEBUG: ow_tcp_read.c:tcp_read(60) attempt 24 bytes Time:(10,0) Byte buffer NETREAD, length=24 --000: 00 00 00 00 00 00 00 14 00 00 00 02 00 00 01 0A --016: 00 00 1F A0 00 00 00 00 <........................> DEBUG: ow_tcp_read.c:tcp_read(108) n=24 nleft=0 n-nleft=24 DEBUG: from_client.c:FromClient(65) FromClient payload=20 size=8096 type=2 sg=0x10A offset=0 DEBUG: from_client.c:FromClient(73) FromClient (no servermessage) payload=20 size=8096 type=2 sg=0x10A offset=0 DEBUG: ow_tcp_read.c:tcp_read(60) attempt 20 bytes Time:(10,0) Byte buffer NETREAD, length=20 --000: 32 36 2E 32 32 46 42 34 46 30 30 30 30 30 30 2F --016: 56 44 44 00 <26.22FB4F000000/VDD.> DEBUG: ow_tcp_read.c:tcp_read(108) n=20 nleft=0 n-nleft=20 CALL: data.c:DataHandler(102) DataHandler: parse path=26.22FB4F000000/VDD DEBUG: ow_parseobject.c:FS_OWQ_create(26) 26.22FB4F000000/VDD CALL: ow_parsename.c:FS_ParsedName_anywhere(115) path=[26.22FB4F000000/VDD] DEBUG: ow_cache.c:Cache_Get_Device(856) 26 22 FB 4F 00 00 00 15 DEBUG: ow_cache.c:Cache_Get_Common(1086) Get from cache sn 26 22 FB 4F 00 00 00 15 pointer=0x7f855d33c420 index=0 size=4 DEBUG: ow_cache.c:Cache_Get_Common(1099) value found in cache. Remaining life: 57 seconds. DEBUG: ow_presence.c:CheckPresence(75) Found device on bus 0 DEBUG: ow_parsename.c:FS_ParsedName_anywhere(199) Dirlength=17 </26.22FB4F000000/VDD> CALL: data.c:DataHandler(125) Read message DEBUG: read.c:ReadHandler(55) ReadHandler: DEBUG: read.c:ReadHandler(61) ReadHandler: From Client sm->payload=20 sm->size=8096 sm->offset=0 DEBUG: read.c:ReadHandler(81) ReadHandler: call FS_read_postparse on /26.22FB4F000000/VDD DEBUG: ow_read.c:FS_read_postparse(78) /26.22FB4F000000/VDD DEBUG: ow_read.c:FS_read_distribute(196) /26.22FB4F000000/VDD DEBUG: ow_read.c:FS_r_given_bus(229) start OWQ OneWireQuery structure of /26.22FB4F000000/VDD OneWireQuery size=8096 offset=0, extension=0 DEBUG: ow_net_server.c:ServerProcessAccept(243) 1082530128 CLOSING Byte buffer OneWireQuery buffer, length=8096 --000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --048: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <................................................................> OneWireQuery I=0 U=0 F=0 Y=0 D=Thu Jan 1 01:00:00 1970 --- OneWireQuery done DEBUG: ow_net_server.c:ServerProcessAccept(189) 3000[1082530128] try lock 0 DEBUG: ow_net_server.c:ServerProcessAccept(195) 3000[1082530128] locked 0 DEBUG: handler.c:Handler(150) OWSERVER handler done DEBUG: ow_net_server.c:ServerProcessHandler(165) Normal exit. DEBUG: to_client.c:ToClient(56) Send delay message DEBUG: to_client.c:ToClient(60) payload=-1 size=0, ret=0, sg=0x0 offset=0 DEBUG: to_client.c:ToClient(56) Send delay message DEBUG: to_client.c:ToClient(60) payload=-1 size=0, ret=0, sg=0x0 offset=0 DEBUG: to_client.c:ToClient(56) Send delay message ... etc ... -- Steinar ; NIL DIFFICILE VOLENTI ; http://latinitas.org ; http://voksenlia.net ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Owfs-developers mailing list Owfs-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/owfs-developers