I looked breifly on the source, and I can’t find any reason to the hanging
lock in ow_read.c:244.

It should lock the parsedname, read the value, and unlock just after it’s
finished. I can’t see that it could skip this unlock function.

 

I’m afraid I have too much here at work and home to look more into it right
now.

 

/Christian

 

 

 

From: Patrik Åkerfeldt [mailto:[email protected]] 
Sent: den 11 mars 2010 18:25
To: OWFS (One-wire file system) discussion and help
Subject: Re: [Owfs-developers] owserver stops responding

 

Thanks for the reply,

Here's the output from gdb:

Thread 3 (Thread 0x473f0940 (LWP 24920)):
#0  0x0000003c0300c21f in sem_timedwait () from /lib64/libpthread.so.0
#1  0x00000000004047ee in Handler (file_descriptor=14) at handler.c:249
#2  0x00002ba23185266b in ServerProcessHandler (arg=0x27dc6c0) at
ow_net_server.c:158
#3  0x0000003c030062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003c028d1e3d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x47df1940 (LWP 24921)):
#0  0x0000003c0300c888 in __lll_mutex_lock_wait () from
/lib64/libpthread.so.0
#1  0x0000003c030088a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2  0x0000003c03008333 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002ba231850466 in LockGet (pn=0x47df1048) at ow_locks.c:189
#4  0x00002ba231858cad in FS_r_given_bus (owq=0x47df1020) at ow_read.c:244
#5  0x00002ba231858fc2 in FS_read_distribute (owq=0x47df1020) at
ow_read.c:207
#6  0x00002ba2318597ec in FS_read_postparse (owq=0x47df1020) at
ow_read.c:110
#7  0x000000000040332c in ReadHandler (hd=0x473efec0, cm=0x47df10e0,
owq=0x47df1020)
    at read.c:84
#8  0x0000000000403ff7 in DataHandler (v=<value optimized out>) at
data.c:133
#9  0x0000003c030062f7 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003c028d1e3d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2ba231d322c0 (LWP 14327)):
#0  0x0000003c0300dba8 in do_sigwait () from /lib64/libpthread.so.0
#1  0x0000003c0300dc4d in sigwait () from /lib64/libpthread.so.0
#2  0x00002ba231852179 in ServerProcess (HandlerRoutine=0x404350 <Handler>,
    Exit=0x402050 <ow_exit>) at ow_net_server.c:349
#3  0x000000000040223b in main (argc=8, argv=0x7fff792bf808) at
owserver.c:162

I hope this will explain things.
-Patrik

2010/3/10 Christian Magnusson <[email protected]>

It really seems like a dead-lock in the code + It seems a bit strange that
the handler_count inside the brackets are not incremented one by one.

This means that you have multiple applications polling the device… I would
guess 5-10 shell-scripts filling up the request queue/capacity.

 

# ps –ef | grep owserver

Find the process id of owserver.

 

Halt the execution with gdb.

# gdb /opt/owfs/bin/owserver –p 12345

(show backtrace for all running threads with gdb)

$ thread apply all bt

 

Press return to scroll down all pages and copy the result here. It will show
if any thread are hanging in a lock-call.

 

I will be on vacation until Sunday, but I’m sure Paul can trace down the
problem if the gdb-output show some obvious hanging.

 

/Christian

 

 

From: Patrik Åkerfeldt [mailto:[email protected]] 
Sent: den 10 mars 2010 18:57
To: [email protected]
Subject: Re: [Owfs-developers] owserver stops responding

 

I have the same problem with owfs-2.7p31. Although the output is slightly
different:

 DEBUG: handler.c:SingleHandler(317) NOPING handler {165}
/26.6447E7000000/temperature
   CALL: handler.c:SingleHandler(273) sem_timedwait timeout time=0.101
(timeout=100 ms)
  DEBUG: handler.c:SingleHandler(317) NOPING handler {170}
/26.6447E7000000/vis
   CALL: handler.c:SingleHandler(273) sem_timedwait timeout time=0.101
(timeout=100 ms)
  DEBUG: handler.c:SingleHandler(317) NOPING handler {153}
/26.6447E7000000/temperature
   CALL: handler.c:SingleHandler(273) sem_timedwait timeout time=0.101
(timeout=100 ms)
  DEBUG: handler.c:SingleHandler(317) NOPING handler {161}
/26.6447E7000000/temperature
   CALL: handler.c:SingleHandler(273) sem_timedwait timeout time=0.101
(timeout=100 ms)
  DEBUG: handler.c:SingleHandler(317) NOPING handler {157}
/26.6447E7000000/vis
   CALL: handler.c:SingleHandler(273) sem_timedwait timeout time=0.101
(timeout=100 ms)
  DEBUG: handler.c:SingleHandler(317) NOPING handler {156}
/26.6447E7000000/vis

I would really appreciate some help in this matter.
-Patrik

Den 10 mars 2010 11.49 skrev Patrik Åkerfeldt <[email protected]>:

No, wrong conclusion of me. I removed the /temperature polling and let the
vis reading remain but it still "hangs".

I will try the latest version of owfs instead.

 

-Patrik

 

Den 10 mars 2010 11.46 skrev Patrik Åkerfeldt <[email protected]>:

 

Could it be that I poll the same device (26.6447E7000000) on different
readings (temperature and vis) at the same time? And that it some how hangs?

 

Polling is made from bash scripts invoked using crontab at the same
intervall.

 

-Patrik

 

Den 9 mars 2010 21.29 skrev Patrik Åkerfeldt <[email protected]>:

 

I've been adding a new device to my 1-wire network that I regularly poll (a
solar sensor + temp from hobby-boards). Since then (I think!), owserver
seems to stop responding after a while. Sometimes it takes 6-10h before it
"hangs" and sometimes just a couple of minutes.

The last time I started owserver with --error_level=9 and this is the output
when owserver is malfunctioning:

  DEBUG: to_client.c:ToClient(63) Send delay message
  DEBUG: handler.c:SingleHandler(239) NOPING handler {43}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(229) PING handler {51}
/26.6447E7000000/temperature
  DEBUG: to_client.c:ToClient(56) payload=-1 size=0, ret=0, sg=0x0 offset=0 
  DEBUG: to_client.c:ToClient(63) Send delay message
  DEBUG: handler.c:SingleHandler(239) NOPING handler {53}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {55}
/26.6447E7000000/vis
  DEBUG: handler.c:SingleHandler(239) NOPING handler {60}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {59}
/26.6447E7000000/vis
  DEBUG: handler.c:SingleHandler(229) PING handler {57}
/26.6447E7000000/temperature
  DEBUG: to_client.c:ToClient(56) payload=-1 size=0, ret=0, sg=0x0 offset=0 
  DEBUG: to_client.c:ToClient(63) Send delay message
  DEBUG: handler.c:SingleHandler(239) NOPING handler {47}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {45}
/26.6447E7000000/vis
  DEBUG: handler.c:SingleHandler(239) NOPING handler {49}
/26.6447E7000000/vis
  DEBUG: handler.c:SingleHandler(239) NOPING handler {43}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {51}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {53}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {55}
/26.6447E7000000/vis
  DEBUG: handler.c:SingleHandler(239) NOPING handler {60}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(239) NOPING handler {59}
/26.6447E7000000/vis
  DEBUG: handler.c:SingleHandler(239) NOPING handler {57}
/26.6447E7000000/temperature
  DEBUG: handler.c:SingleHandler(229) PING handler {47}
/26.6447E7000000/temperature
  DEBUG: to_client.c:ToClient(56) payload=-1 size=0, ret=0, sg=0x0 offset=0 

I can't tell what's wrong from these messages but perhaps somebody else can?
The issue is temporary resolved by restarting owserver.

owserver is started this way: /opt/owfs/bin/owserver -u -p 3001
--usb_regulartime --timeout_volatile=0 --foreground --error_level=9
Running owserver from owfs-2.7p26.

Thanks,
-Patrik

 

 

 

 

__________ Information från ESET NOD32 Antivirus, version av
virussignaturdatabas 4933 (20100310) __________

 

Meddelandet har kontrollerats av ESET NOD32 Antivirus.

 

http://www.esetscandinavia.com



__________ Information från ESET NOD32 Antivirus, version av
virussignaturdatabas 4933 (20100310) __________

Meddelandet har kontrollerats av ESET NOD32 Antivirus.

http://www.esetscandinavia.com


----------------------------------------------------------------------------
--
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers

 

 

__________ Information från ESET NOD32 Antivirus, version av
virussignaturdatabas 4933 (20100310) __________

 

Meddelandet har kontrollerats av ESET NOD32 Antivirus.

 

http://www.esetscandinavia.com

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers

Reply via email to