Since I made a fuss about this yesterday on the mailing list, I want to 
apologize just as publicly.

The problem I was seeing is the result of my failure to understand the subtle 
mechanism for terminating the IoTivity stack.  It is a multistage process 
involving first 'stop'ing the adapters, then 'terminate'ing the adapters.  When 
done properly, this works to shut down the stack.

I am sorry I was unable to understand this mechanism before writing to the 
list.  Thank you to Abishek for providing an explanation I could understand.

John Light
Intel OTC OIC Development

From: iotivity-dev-bounces at lists.iotivity.org 
[mailto:[email protected]] On Behalf Of Light, John J
Sent: Monday, July 06, 2015 6:27 PM
To: iotivity-dev at lists.iotivity.org
Subject: [dev] thread termination

All,

Today I faced a problem running Jenkins to completion on my patch "New IP 
Adapter supports IPv6".  It turned out to be a problem terminating the IoTivity 
stack.  I don't understand why none of the previous 100 Jenkins tests had found 
the problem, but this one did.  The way it failed was to hang the test, which 
resulted in an opaque Jenkins timeout.

I chased the problem down to the order of resource termination in 
camessagehandler.c.  In CATerminateMessageHandler two resources are shut down 
in this order:

1.       ca_thread_pool_free.  This joins all the threads started out of the 
thread pool.

2.       CATerminateAdapters.  This terminates the adapters.

For those unfamiliar with thread join I will say that the 
pthread_join(threadID) function waits for the thread with id threadID to exit.  
This wait can be a very long time if the thread doesn't exit, and that's what 
happened in my case.  In fact, two threads in the IP Adapter didn't know they 
should exit, so their respective joins wouldn't complete, leaving Jenkins to 
time out.

The solution is quite simple: stop the adapter before joining the threads.  
Stopping the adapter informs both threads to exit.  Then when the join occurs, 
the thread requesting the joins isn't blocked.

The reason I am bring this up is that there appears to be an impasse over this 
issue.  Ossama Othman found this problem working with the Linux GATT adapter 
and submitted patch 1044<https://gerrit.iotivity.org/gerrit/1044>, Destroy 
threadpool after all transports have been terminated  to fix this problem.  If 
his patch had been merged, I wouldn't have had to spend the day solving the 
problem again.

I have examined the bt_le_adapter code, and it seems to have the same problem.  
It has four threads that only stop if the adapter terminate code is called.  In 
current master this doesn't happen, so they should have the same problem.  I am 
not in a position to run a BLE adapter that uses the bt_le code to test my 
observation, but I will welcome an explanation of how it works in current 
master.

Since the IP Adapter won't shut down properly without Ossama's patch, I will 
include his path in my patch.

Since the reviewers have not been able to reach a consensus on the need for 
Ossama's patch I invite all of you to examine it and provide feedback.  I would 
like to see constructive criticism of both his patch and the way master 
currently works.

If Ossama's patch can't be merged, then we won't be able to merge IPv6 either.

John Light
Intel OTC OIC Development
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.iotivity.org/pipermail/iotivity-dev/attachments/20150707/ae6be02e/attachment.html>

Reply via email to