Paul Alfille <paul.alfi...@gmail.com> writes:

> Great, a nice test case. Should be fixable. I like the shell script approach.


...
>     The problem seems to be that owserver is relatively fine as long as
>     the requests come in an orderly fashion, such as in this script:
...

>     But if I change the script to requesting everything at the same time,
>     like this:
>
>     #!/bin/sh
>
>     while true; do
>       owget -s 3000 uncached/10.7F1914000800/temperature &
>       owget -s 3000 20.E0C700000000/volt.ALL &
>       owget -s 3000 uncached/1D.0A5201000000/counters.A &
>       owget -s 3000 10.7F1914000800/temperature &
>       owget -s 3000 26.22FB4F000000/temperature &
>       owget -s 3000 26.22FB4F000000/VDD &
>       sleep 5
>     done
>
>     then owserver breaks within seconds, owget processes pile up because
>     owserver doesn't respond for some files.  Eventally owserver might crash.

Anything I can do to debug this?  It's easy to reproduce.  Owserver
seems pretty fragile, i.e. it'll crash under some load.

I see a few instances of SIGPIPE signals, I'm not sure whether this is
a problem or gdb catching what owserver otherwise will catch and
handle.  This might be caused by a client terminating prematurely.
And I also get abort()s, which seems like something else.

Here's some gdb output while running owserver:

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x458f3950 (LWP 32021)]
0x00007f36fdbe78e3 in writev () from /lib/libc.so.6
(gdb) bt
#0  0x00007f36fdbe78e3 in writev () from /lib/libc.so.6
#1  0x0000000000402d27 in ToClient (file_descriptor=9, 
machine_order_cm=0x458f2ea0, data=0x0) at to_client.c:80
#2  0x000000000040503d in SingleHandler (hd=0x458f2f90) at handler.c:215
#3  0x0000000000404b2b in Handler (file_descriptor=9) at handler.c:119
#4  0x00007f36fe789faa in ServerProcessHandler (arg=0x60bd80) at 
ow_net_server.c:158
#5  0x00007f36fde7f3f7 in start_thread () from /lib/libpthread.so.0
#6  0x00007f36fdbeeb3d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()
(gdb) up
#1  0x0000000000402d27 in ToClient (file_descriptor=9, 
machine_order_cm=0x458f2ea0, data=0x0) at to_client.c:80
80                     ret = writev(file_descriptor, io, nio) != (ssize_t) 
(io[0].iov_len + io[1].iov_len);


Similar:

Writev byte buffer ionr=1/2 length=24
--000: 00 00 00 00 00 00 00 0C 00 00 00 0C 00 00 01 01
--016: 00 00 00 0C 00 00 00 00
   <........................>
Writev byte buffer ionr=2/2 length=12
--000: 20 20 20 20 20 20 20 20 34 2E 38 31
   <        4.81>

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x43ec1950 (LWP 23028)]
0x00007f5cf2c008e3 in writev () from /lib/libc.so.6
(gdb) bt
#0  0x00007f5cf2c008e3 in writev () from /lib/libc.so.6
#1  0x0000000000402d27 in ToClient (file_descriptor=10, 
machine_order_cm=0x43ec1050, data=0x60a9b0 "        4.81") at to_client.c:80
#2  0x00000000004046fb in DataHandler (v=0x47ec8f90) at data.c:180
#3  0x00007f5cf2e983f7 in start_thread () from /lib/libpthread.so.0
#4  0x00007f5cf2c07b3d in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()


A crash:

[New Thread 0x44210950 (LWP 27589)]
   CALL: data.c:DataHandler(125) Read message
  DEBUG: read.c:ReadHandler(55) ReadHandler:
  DEBUG: read.c:ReadHandler(61) ReadHandler: From Client sm->payload=25 
sm->size=51 sm->offset=0
  DEBUG: read.c:ReadHandler(81) ReadHandler: call FS_read_postparse on 
/20.E0C700000000/volt.ALL
  DEBUG: ow_read.c:FS_read_postparse(78) /20.E0C700000000/volt.ALL
  DEBUG: ow_read.c:FS_read_distribute(196) /20.E0C700000000/volt.ALL
  DEBUG: ow_read.c:FS_r_given_bus(229) start
OWQ OneWireQuery structure of /20.E0C700000000/volt.ALL
    OneWireQuery size=51 offset=0, extension=-1
Byte buffer OneWireQuery buffer, length=51
--000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--048: 00 00 00
   <...................................................>
    OneWireQuery I=6335920 U=6335920 F=3.13036E-317 Y=6335920 D=Sun Mar 15 
08:58:40 1970

--- OneWireQuery done
*** glibc detected *** /usr/local/bin/owserver: malloc(): memory corruption 
(fast): 0x000000000060c3a0 ***
   CALL: data.c:DataHandler(102) DataHandler: parse 
path=uncached/1D.0A5201000000/counters.A
  DEBUG: ow_parseobject.c:FS_OWQ_create(26) uncached/1D.0A5201000000/counters.A
======= Backtrace: =========
   CALL: ow_parsename.c:FS_ParsedName_anywhere(115) 
path=[uncached/1D.0A5201000000/counters.A]
  DEBUG: ow_cache.c:Cache_Get_Device(856) 1D 0A 52 01 00 00 00 95
/lib/libc.so.6[0x7f9c544003b2]
/lib/libc.so.6(__libc_malloc+0x90)[0x7f9c54401360]
  DEBUG: ow_cache.c:Cache_Get_Common(1086) Get from cache sn 1D 0A 52 01 00 00 
00 95 pointer=0x7f9c55251cc0 index=0 size=4
/lib/libc.so.6(tsearch+0x10a)[0x7f9c544610ea]
/usr/local/lib/libow-2.7.so.21(LockGet+0x160)[0x7f9c54ffb3a1]
/usr/local/lib/libow-2.7.so.21[0x7f9c55007130]
/usr/local/lib/libow-2.7.so.21(FS_read_distribute+0x212)[0x7f9c55006de7]
/usr/local/lib/libow-2.7.so.21[0x7f9c5500671e]
/usr/local/lib/libow-2.7.so.21(FS_read_postparse+0x2cc)[0x7f9c55006598]
/usr/local/bin/owserver[0x4036cf]
/usr/local/bin/owserver[0x4042dd]
/lib/libpthread.so.0[0x7f9c546f33f7]
  DEBUG: ow_cache.c:Cache_Get_Common(1099) value found in cache. Remaining 
life: 115 seconds.
  DEBUG: ow_presence.c:CheckPresence(75) Found device on bus 0
  DEBUG: ow_parsename.c:FS_ParsedName_anywhere(199) Dirlength=26 
</uncached/1D.0A5201000000/counters.A>
   CALL: data.c:DataHandler(125) Read message
  DEBUG: read.c:ReadHandler(55) ReadHandler:
  DEBUG: read.c:ReadHandler(61) ReadHandler: From Client sm->payload=36 
sm->size=12 sm->offset=0
  DEBUG: read.c:ReadHandler(81) ReadHandler: call FS_read_postparse on 
/uncached/1D.0A5201000000/counters.A
  DEBUG: ow_read.c:FS_read_postparse(78) /uncached/1D.0A5201000000/counters.A
  DEBUG: ow_read.c:FS_read_distribute(196) /uncached/1D.0A5201000000/counters.A
  DEBUG: ow_read.c:FS_r_given_bus(229) start
OWQ OneWireQuery structure of /uncached/1D.0A5201000000/counters.A
    OneWireQuery size=12 offset=0, extension=0
01000000/counters.A

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x48218950 (LWP 27587)]
0x00007f9c543bd095 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007f9c543bd095 in raise () from /lib/libc.so.6
#1  0x00007f9c543beaf0 in abort () from /lib/libc.so.6
#2  0x00007f9c543f7a7b in ?? () from /lib/libc.so.6
#3  0x00007f9c544003b2 in ?? () from /lib/libc.so.6
#4  0x00007f9c54401360 in malloc () from /lib/libc.so.6
#5  0x00007f9c544610ea in tsearch () from /lib/libc.so.6
#6  0x00007f9c54ffb3a1 in LockGet (pn=0x482180a8) at ow_locks.c:173
#7  0x00007f9c55007130 in FS_r_given_bus (owq=0x48218090) at ow_read.c:244
#8  0x00007f9c55006de7 in FS_read_distribute (owq=0x48218090) at ow_read.c:206
#9  0x00007f9c5500671e in FS_read_real (owq=0x48218090) at ow_read.c:109
#10 0x00007f9c55006598 in FS_read_postparse (owq=0x48218090) at ow_read.c:87
#11 0x00000000004036cf in ReadHandler (hd=0x46213f90, cm=0x48218050, 
owq=0x48218090) at read.c:86
#12 0x00000000004042dd in DataHandler (v=0x46213f90) at data.c:126
#13 0x00007f9c546f33f7 in start_thread () from /lib/libpthread.so.0
#14 0x00007f9c54462b3d in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()
(gdb) up
#1  0x00007f9c543beaf0 in abort () from /lib/libc.so.6
(gdb) up
#2  0x00007f9c543f7a7b in ?? () from /lib/libc.so.6
(gdb) up
#3  0x00007f9c544003b2 in ?? () from /lib/libc.so.6
(gdb) up
#4  0x00007f9c54401360 in malloc () from /lib/libc.so.6
(gdb) up
#5  0x00007f9c544610ea in tsearch () from /lib/libc.so.6
(gdb) up
#6  0x00007f9c54ffb3a1 in LockGet (pn=0x482180a8) at ow_locks.c:173
173                    if ((opaque = (struct dev_opaque *)tsearch(dlock, 
&(pn->selected_connection->dev_db), dev_compare)) == NULL) { // unfound and 
uncreatable
(gdb) print dlock
$1 = (struct devlock *) 0x60c240
(gdb) print *dlock
$2 = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
__kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\0' <repeats 16 times>, "", '\0' <repeats 19 times>, __align = 
0}, sn = " \000\000\000\000", users = 0}




Here's the output from owserver when I do an owget which never
finishes, but owserver keeps running:


owget -s 3000 26.22FB4F000000/VDD


  DEBUG: ow_net_server.c:ServerProcessAccept(203) 3000[1082530128] accept 0 
fd=17
  DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(177) unlock 1082530128
  DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(179) unlock 1082530128 done
  DEBUG: ow_net_server.c:ServerProcessAccept(216)  3000[1082530128] unlock 0
  DEBUG: ow_tcp_read.c:tcp_read(60) attempt 24 bytes Time:(10,0)
Byte buffer NETREAD, length=24
--000: 00 00 00 00 00 00 00 14 00 00 00 07 00 00 01 0A
--016: 00 00 00 00 00 00 00 00
   <........................>
  DEBUG: ow_tcp_read.c:tcp_read(108) n=24 nleft=0 n-nleft=24
  DEBUG: from_client.c:FromClient(65) FromClient payload=20 size=0 type=7 
sg=0x10A offset=0
  DEBUG: from_client.c:FromClient(73) FromClient (no servermessage) payload=20 
size=0 type=7 sg=0x10A offset=0
  DEBUG: ow_tcp_read.c:tcp_read(60) attempt 20 bytes Time:(10,0)
Byte buffer NETREAD, length=20
--000: 32 36 2E 32 32 46 42 34 46 30 30 30 30 30 30 2F
--016: 56 44 44 00
   <26.22FB4F000000/VDD.>
  DEBUG: ow_tcp_read.c:tcp_read(108) n=20 nleft=0 n-nleft=20
   CALL: data.c:DataHandler(102) DataHandler: parse path=26.22FB4F000000/VDD
  DEBUG: ow_parseobject.c:FS_OWQ_create(26) 26.22FB4F000000/VDD
   CALL: ow_parsename.c:FS_ParsedName_anywhere(115) path=[26.22FB4F000000/VDD]
  DEBUG: ow_cache.c:Cache_Get_Device(856) 26 22 FB 4F 00 00 00 15
  DEBUG: ow_cache.c:Cache_Get_Common(1086) Get from cache sn 26 22 FB 4F 00 00 
00 15 pointer=0x7f855d33c420 index=0 size=4
  DEBUG: ow_cache.c:Cache_Get_Common(1099) value found in cache. Remaining 
life: 57 seconds.
  DEBUG: ow_presence.c:CheckPresence(75) Found device on bus 0
  DEBUG: ow_parsename.c:FS_ParsedName_anywhere(199) Dirlength=17 
</26.22FB4F000000/VDD>
   CALL: data.c:DataHandler(145) Directory message (all at once)
  DEBUG: dirall.c:DirallHandler(66) OWSERVER Dir-All SpecifiedBus=0 path = 
/26.22FB4F000000/VDD
  DEBUG: ow_dir.c:FS_dir_remote(74) path=/26.22FB4F000000/VDD
   CALL: ow_dir.c:FS_dir_both(102) path=/26.22FB4F000000/VDD
  DEBUG: ow_dir.c:FS_dir_both(186) ret=-20
  DEBUG: ow_parseobject.c:FS_OWQ_destroy(141) /26.22FB4F000000/VDD
  DEBUG: ow_parsename.c:FS_ParsedName_destroy(64) /26.22FB4F000000/VDD
  DEBUG: data.c:DataHandler(162) DataHandler: FS_ParsedName_destroy done
  DEBUG: data.c:DataHandler(176) DataHandler: cm.ret=-20
  DEBUG: to_client.c:ToClient(60) payload=0 size=0, ret=-20, sg=0x10A offset=0 
Writev byte buffer ionr=1/1 length=24
--000: 00 00 00 00 00 00 00 00 FF FF FF EC 00 00 01 0A
--016: 00 00 00 00 00 00 00 00
   <........................>
  DEBUG: data.c:DataHandler(193) DataHandler: done
  DEBUG: ow_net_server.c:ServerProcessAccept(243) 1082530128 CLOSING
  DEBUG: ow_net_server.c:ServerProcessAccept(189) 3000[1082530128] try lock 0
  DEBUG: ow_net_server.c:ServerProcessAccept(195) 3000[1082530128] locked 0
  DEBUG: ow_net_server.c:ServerProcessAccept(203) 3000[1082530128] accept 0 fd=5
  DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(177) unlock 1082530128
  DEBUG: ow_net_server.c:ServerProcessAcceptUnlock(179) unlock 1082530128 done
  DEBUG: ow_net_server.c:ServerProcessAccept(216)  3000[1082530128] unlock 0
  DEBUG: ow_tcp_read.c:tcp_read(60) attempt 24 bytes Time:(10,0)
Byte buffer NETREAD, length=24
--000: 00 00 00 00 00 00 00 14 00 00 00 02 00 00 01 0A
--016: 00 00 1F A0 00 00 00 00
   <........................>
  DEBUG: ow_tcp_read.c:tcp_read(108) n=24 nleft=0 n-nleft=24
  DEBUG: from_client.c:FromClient(65) FromClient payload=20 size=8096 type=2 
sg=0x10A offset=0
  DEBUG: from_client.c:FromClient(73) FromClient (no servermessage) payload=20 
size=8096 type=2 sg=0x10A offset=0
  DEBUG: ow_tcp_read.c:tcp_read(60) attempt 20 bytes Time:(10,0)
Byte buffer NETREAD, length=20
--000: 32 36 2E 32 32 46 42 34 46 30 30 30 30 30 30 2F
--016: 56 44 44 00
   <26.22FB4F000000/VDD.>
  DEBUG: ow_tcp_read.c:tcp_read(108) n=20 nleft=0 n-nleft=20
   CALL: data.c:DataHandler(102) DataHandler: parse path=26.22FB4F000000/VDD
  DEBUG: ow_parseobject.c:FS_OWQ_create(26) 26.22FB4F000000/VDD
   CALL: ow_parsename.c:FS_ParsedName_anywhere(115) path=[26.22FB4F000000/VDD]
  DEBUG: ow_cache.c:Cache_Get_Device(856) 26 22 FB 4F 00 00 00 15
  DEBUG: ow_cache.c:Cache_Get_Common(1086) Get from cache sn 26 22 FB 4F 00 00 
00 15 pointer=0x7f855d33c420 index=0 size=4
  DEBUG: ow_cache.c:Cache_Get_Common(1099) value found in cache. Remaining 
life: 57 seconds.
  DEBUG: ow_presence.c:CheckPresence(75) Found device on bus 0
  DEBUG: ow_parsename.c:FS_ParsedName_anywhere(199) Dirlength=17 
</26.22FB4F000000/VDD>
   CALL: data.c:DataHandler(125) Read message
  DEBUG: read.c:ReadHandler(55) ReadHandler:
  DEBUG: read.c:ReadHandler(61) ReadHandler: From Client sm->payload=20 
sm->size=8096 sm->offset=0
  DEBUG: read.c:ReadHandler(81) ReadHandler: call FS_read_postparse on 
/26.22FB4F000000/VDD
  DEBUG: ow_read.c:FS_read_postparse(78) /26.22FB4F000000/VDD
  DEBUG: ow_read.c:FS_read_distribute(196) /26.22FB4F000000/VDD
  DEBUG: ow_read.c:FS_r_given_bus(229) start
OWQ OneWireQuery structure of /26.22FB4F000000/VDD
    OneWireQuery size=8096 offset=0, extension=0
  DEBUG: ow_net_server.c:ServerProcessAccept(243) 1082530128 CLOSING
Byte buffer OneWireQuery buffer, length=8096
--000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--016: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--032: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--048: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   <................................................................>
    OneWireQuery I=0 U=0 F=0 Y=0 D=Thu Jan  1 01:00:00 1970

--- OneWireQuery done
  DEBUG: ow_net_server.c:ServerProcessAccept(189) 3000[1082530128] try lock 0
  DEBUG: ow_net_server.c:ServerProcessAccept(195) 3000[1082530128] locked 0
  DEBUG: handler.c:Handler(150) OWSERVER handler done
  DEBUG: ow_net_server.c:ServerProcessHandler(165) Normal exit.
  DEBUG: to_client.c:ToClient(56) Send delay message
  DEBUG: to_client.c:ToClient(60) payload=-1 size=0, ret=0, sg=0x0 offset=0 
  DEBUG: to_client.c:ToClient(56) Send delay message
  DEBUG: to_client.c:ToClient(60) payload=-1 size=0, ret=0, sg=0x0 offset=0 
  DEBUG: to_client.c:ToClient(56) Send delay message
... etc ...


-- 
Steinar ; NIL DIFFICILE VOLENTI ; http://latinitas.org ; http://voksenlia.net

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Owfs-developers mailing list
Owfs-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/owfs-developers

Reply via email to