Hi Jonas,
Ok , I just pushed , please test once on 4.7 :
============================================================
branch: opensaf-4.7.x
parent: 8043:4a8a00097561
user: A V Mahesh <mahesh.va...@oracle.com>
date: Thu Sep 15 10:50:31 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
============================================================
-AVM
On 9/15/2016 12:08 AM, Jonas Arndt wrote:
Mahesh,
Can we get this back-ported to 4.7.x as well?
Cheers,
// Jonas
------------------------------------------------------------------------
*[tickets:#2014] <https://sourceforge.net/p/opensaf/tickets/2014/>
Rebooted controller not detected in TCP*
*Status:* review
*Milestone:* 5.0.1
*Created:* Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
*Last Updated:* Wed Sep 14, 2016 04:51 AM UTC
*Owner:* A V Mahesh (AVM)
*Attachments:*
* logs.tgz
<https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz>
(84.1 kB; application/x-compressed-tar)
* tcp_user_timeout_2014.patch
<https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch>
(5.5 kB; application/octet-stream)
OS environment:
Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is there in
all configurations)
In 20% of the cases a "reboot -f" on controller2 is not detected and
acted on. What is in the mds.log is .....
Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV:
Adest=<0x00000000,1>
Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV:
Anchor=<0x0002020f,1790>
Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or
Error occured
Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV:
Adest=<0x00000000,1>
Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV:
Anchor=<0x0002020f,1790>
Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or
Error occured
Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV:
Adest=<0x00000000,1>
Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV:
Anchor=<0x0002020f,1790>
Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or
Error occured
Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV:
Adest=<0x00000000,1>
Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV:
Anchor=<0x0002020f,1790>
Still, there is nothing in the syslog indicating that controller2 has
left the cluster. This is for TCP.
When the node comes back on line (without opensaf being started)
controller 1 notice finally and fail over apps.
When the reboot is not detected the tcp keep alives stops and goes
into retransmits instead. I have attached 2 tshark sessions captured
from controller1, capturing traffic between controller1 and
controller2. The failed reboot detect is captured in
"ctrl2_failed_detection.trc" and for a working detection there is a
file "ctrl2_working.trc" I have also attached all logs in
/var/log/opensaf and the syslog (all from controller one).
It appears to me that we are hitting something similar like
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect"
// Jonas
------------------------------------------------------------------------
Sent from sourceforge.net because
opensaf-tickets@lists.sourceforge.net is subscribed to
https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change
settings at https://sourceforge.net/p/opensaf/admin/tickets/options.
Or, if this is a mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets