Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Christophe Ronco

I make a quick review, use checkpatch and post it.

Thanks for testing and sorry to not always take the time needed to 
propose a patch to mainline.



Christophe


On 10/26/2017 12:54 PM, Jonas Bonn wrote:

On 10/26/2017 12:43 PM, Christophe Ronco wrote:




Seems to be this:

i)  When the modem is first powered on, service discovery takes some 
time (more than 5 seconds)
ii)  The service discovery timeout fires before the QMI request 
returns so discover_reply gets called before discover_callback and 
things get cleaned up
iii)  Then the QMI request returns and discover_callback gets called 
even though the request has timed out in ii)


...and this is where things go wrong because the userdata pointer to 
discover_callback is probably no longer valid.


How do we handle the QMI request returning a _late_ response, i.e. 
after it technically has timed out?  I'll dig a bit more...


/Jonas


Hi Jonas,

When I look at my personal patches, it seems I've already seen that. 
Please find attached the patch I currently have in my setup on this 
subject.




Yes, that's exactly what I was just in the process of implementing.  
Glad you beat me to it.  :)


I tested your patch and it solves the problem.  Can you submit it to 
the list?  If not, I will... :)


I think service_create_reply() has the same issue but I don't hit it 
in my setup..


Eswaran:  try the patch from Christophe... I'm 99% certain it solves 
your issue.


/Jonas


Christophe






___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono





___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Jonas Bonn

On 10/26/2017 12:43 PM, Christophe Ronco wrote:




Seems to be this:

i)  When the modem is first powered on, service discovery takes some 
time (more than 5 seconds)
ii)  The service discovery timeout fires before the QMI request 
returns so discover_reply gets called before discover_callback and 
things get cleaned up
iii)  Then the QMI request returns and discover_callback gets called 
even though the request has timed out in ii)


...and this is where things go wrong because the userdata pointer to 
discover_callback is probably no longer valid.


How do we handle the QMI request returning a _late_ response, i.e. 
after it technically has timed out?  I'll dig a bit more...


/Jonas


Hi Jonas,

When I look at my personal patches, it seems I've already seen that. 
Please find attached the patch I currently have in my setup on this 
subject.




Yes, that's exactly what I was just in the process of implementing. Glad 
you beat me to it.  :)


I tested your patch and it solves the problem.  Can you submit it to the 
list?  If not, I will... :)


I think service_create_reply() has the same issue but I don't hit it in 
my setup..


Eswaran:  try the patch from Christophe... I'm 99% certain it solves 
your issue.


/Jonas


Christophe






___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono



___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Christophe Ronco




Seems to be this:

i)  When the modem is first powered on, service discovery takes some 
time (more than 5 seconds)
ii)  The service discovery timeout fires before the QMI request 
returns so discover_reply gets called before discover_callback and 
things get cleaned up
iii)  Then the QMI request returns and discover_callback gets called 
even though the request has timed out in ii)


...and this is where things go wrong because the userdata pointer to 
discover_callback is probably no longer valid.


How do we handle the QMI request returning a _late_ response, i.e. 
after it technically has timed out?  I'll dig a bit more...


/Jonas


Hi Jonas,

When I look at my personal patches, it seems I've already seen that. 
Please find attached the patch I currently have in my setup on this subject.


Christophe




>From 15666c90d20ee7481d0180a8246c53e5a29d5bcd Mon Sep 17 00:00:00 2001
From: Christophe Ronco 
Date: Wed, 27 Sep 2017 10:38:50 +0200
Subject: [PATCH] qmi: remove request when it timeouts

When modem does not answer or answers slowly to a discovery request,
a timeout occurs.
In timeout callback, request should be removed from queues to avoid
treating answer if it arrives later.
---
 drivers/qmimodem/qmi.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/qmimodem/qmi.c b/drivers/qmimodem/qmi.c
index c538cb9..65263d2 100644
--- a/drivers/qmimodem/qmi.c
+++ b/drivers/qmimodem/qmi.c
@@ -1073,6 +1073,7 @@ struct discover_data {
 	qmi_discover_func_t func;
 	void *user_data;
 	qmi_destroy_func_t destroy;
+	uint8_t tid;
 	guint timeout;
 };
 
@@ -1181,14 +1182,38 @@ static gboolean discover_reply(gpointer user_data)
 {
 	struct discover_data *data = user_data;
 	struct qmi_device *device = data->device;
+	unsigned int tid = (unsigned int)(data->tid);
+	GList *list;
+	struct qmi_request *req = NULL;
 
 	data->timeout = 0;
 
+	/* remove request from queues */
+	if (tid != 0) {
+		list = g_queue_find_custom(device->req_queue,
+GUINT_TO_POINTER(tid), __request_compare);
+
+		if (list) {
+			req = list->data;
+			g_queue_delete_link(device->req_queue, list);
+		} else {
+			list = g_queue_find_custom(device->control_queue,
+GUINT_TO_POINTER(tid), __request_compare);
+
+			if (list) {
+req = list->data;
+g_queue_delete_link(device->control_queue,
+list);
+			}
+		}
+	}
+
 	if (data->func)
 		data->func(device->version_count,
 device->version_list, data->user_data);
 
 	__qmi_device_discovery_complete(data->device, &data->super);
+	__request_free(req, NULL);
 
 	return FALSE;
 }
@@ -1234,6 +1259,7 @@ bool qmi_device_discover(struct qmi_device *device, qmi_discover_func_t func,
 
 	hdr->type = 0x00;
 	hdr->transaction = device->next_control_tid++;
+	data->tid = hdr->transaction;
 
 	__request_submit(device, req, hdr->transaction);
 
-- 
2.7.4

___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Jonas Bonn

On 10/26/2017 11:14 AM, Jonas Bonn wrote:

On 10/26/2017 11:04 AM, Jonas Bonn wrote:

On 10/26/2017 10:55 AM, Jonas Bonn wrote:

On 10/26/2017 10:48 AM, Eswaran Vinothkumar (BEG/PJ-IOT-EL) wrote:


May I know is there any config option to disable the signal handler 
in Ofono.


No, there's not... you need to hack it out of the code. Search for 
the following line in src/main.c:


signal = setup_signalfd();

and comment it out.  I think that's sufficient...


Sorry, that was wrong.  It's signal_setup() in src/log.c that needs 
to be adjusted.  Just put a 'return' at the beginning of that 
function so that it becomes a no-op.


/Jonas

Here's a backtrace from the crasher that I'm seeing.  I suspect it's 
the same issue you have:


#0  0x7ffa4c7a316e in g_queue_remove ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1  0x5624b81441b9 in __qmi_device_discovery_complete 
(d=0x5624b95bcd40,

device=) at drivers/qmimodem/qmi.c:889
#2  0x5624b8144d21 in handle_packet (buf=,
hdr=, device=0x5624b95bcc90) at 
drivers/qmimodem/qmi.c:817

#3  received_data (user_data=0x5624b95bcc90, cond=,
channel=) at drivers/qmimodem/qmi.c:865
#4  0x7ffa4c79168a in g_main_context_dispatch ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x7ffa4c791a40 in ?? () from 
/lib/x86_64-linux-gnu/libglib-2.0.so.0

#6  0x7ffa4c791d62 in g_main_loop_run ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x5624b8107e77 in main (argc=, argv=out>)

at src/main.c:306

/Jonas


Seems to be this:

i)  When the modem is first powered on, service discovery takes some 
time (more than 5 seconds)
ii)  The service discovery timeout fires before the QMI request returns 
so discover_reply gets called before discover_callback and things get 
cleaned up
iii)  Then the QMI request returns and discover_callback gets called 
even though the request has timed out in ii)


...and this is where things go wrong because the userdata pointer to 
discover_callback is probably no longer valid.


How do we handle the QMI request returning a _late_ response, i.e. after 
it technically has timed out?  I'll dig a bit more...


/Jonas








___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono





___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono



___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Jonas Bonn

On 10/26/2017 11:04 AM, Jonas Bonn wrote:

On 10/26/2017 10:55 AM, Jonas Bonn wrote:

On 10/26/2017 10:48 AM, Eswaran Vinothkumar (BEG/PJ-IOT-EL) wrote:


May I know is there any config option to disable the signal handler 
in Ofono.


No, there's not... you need to hack it out of the code.  Search for 
the following line in src/main.c:


signal = setup_signalfd();

and comment it out.  I think that's sufficient...


Sorry, that was wrong.  It's signal_setup() in src/log.c that needs to 
be adjusted.  Just put a 'return' at the beginning of that function so 
that it becomes a no-op.


/Jonas

Here's a backtrace from the crasher that I'm seeing.  I suspect it's the 
same issue you have:


#0  0x7ffa4c7a316e in g_queue_remove ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1  0x5624b81441b9 in __qmi_device_discovery_complete 
(d=0x5624b95bcd40,

device=) at drivers/qmimodem/qmi.c:889
#2  0x5624b8144d21 in handle_packet (buf=,
hdr=, device=0x5624b95bcc90) at 
drivers/qmimodem/qmi.c:817

#3  received_data (user_data=0x5624b95bcc90, cond=,
channel=) at drivers/qmimodem/qmi.c:865
#4  0x7ffa4c79168a in g_main_context_dispatch ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x7ffa4c791a40 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6  0x7ffa4c791d62 in g_main_loop_run ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x5624b8107e77 in main (argc=, argv=)
at src/main.c:306

/Jonas






___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono



___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Jonas Bonn

On 10/26/2017 10:55 AM, Jonas Bonn wrote:

On 10/26/2017 10:48 AM, Eswaran Vinothkumar (BEG/PJ-IOT-EL) wrote:


May I know is there any config option to disable the signal handler 
in Ofono.


No, there's not... you need to hack it out of the code.  Search for 
the following line in src/main.c:


signal = setup_signalfd();

and comment it out.  I think that's sufficient...


Sorry, that was wrong.  It's signal_setup() in src/log.c that needs to 
be adjusted.  Just put a 'return' at the beginning of that function so 
that it becomes a no-op.


/Jonas

___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono


Re: AW: Ofono 1.21 crashes with Sierra MC7455

2017-10-26 Thread Jonas Bonn

On 10/26/2017 10:48 AM, Eswaran Vinothkumar (BEG/PJ-IOT-EL) wrote:


On 10/25/2017 02:38 PM, Eswaran Vinothkumar (BEG/PJ-IOT-EL) wrote:

Hello,

For our next connectivity project we are planning to use Connman
along with oFono

The version being used are Connman :1.35 and oFono:1.21

After system start up, I am seeing that the oFono process gets
exited, whenever I try to enable the modem by calling the python
script enable-modem in the test directory. However, if I restart
the ofono systemd service, everything works fine.


A couple of comments here:

i)  You have connman and ofono running via systemd and you're calling 
enable-modem from the test directory... I don't think this is a robust 
way to go about testing things.  Try to simplify by stopping connman 
and ofono and running ofono manually from the command line before 
calling the test script.


# systemctl stop connman
# systemctl stop ofono
# ofonod -d -n
# ./enable-modem

ii)  When you say "everything works fine", what do you mean?  The test 
script can run...?


Ya after restarting the ofono.service , I could successfully run the 
test scripts




I suspect you are seeing the issue I mentioned in iii) below...



iii)  There _is_ a race in the gobi module at just the spot that you 
are seeing the crash.  I have seen it many times but haven't taken the 
time to isolate the issue... if I get a moment today, I might take a 
look at it.  I can trigger it by calling enable-modem too early in the 
ofono's startup process... it occurs maybe 1 time in 5 on my setup.


iv)  Like Christophe said, disable ofono's signal handler and enable 
core dumps before running ofono.  The core dump is easier to analyze 
than ofono's backtrace.


May I know is there any config option to disable the signal handler in 
Ofono.


No, there's not... you need to hack it out of the code.  Search for the 
following line in src/main.c:


signal = setup_signalfd();

and comment it out.  I think that's sufficient...

/Jonas





/Jonas


I have attached the logs for your reference. Please let me know,
if any further log messages are required.

Modem chip. MC7455 from sierra

Linux kernel: 4.9.44-fslc+g8f876e1 (freescale)

Mit freundlichen Grüßen / Best regards
Vinothkumar Eswaran





___

ofono mailing list

ofono@ofono.org 

https://lists.ofono.org/mailman/listinfo/ofono



___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono



___
ofono mailing list
ofono@ofono.org
https://lists.ofono.org/mailman/listinfo/ofono