Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Mia Lueng
The environment has been recovered. I modified the pacemaker stop fail
action to "echo c >/proc/sysrq-trigger" so that  the system will be
reboot and generate vmcore when resource stop fail.
 I am sure that the reason is oracle stop action is stalled during
drbd resync. All the device used the same replication link.
Here is "foreach bt" resource in vmcore analysis:

PID: 6870   TASK: 8802c89b84c0  CPU: 14  COMMAND: "oracle"
 #0 [880281bd79c8] schedule at 8145a489
 #1 [880281bd7b10] do_get_write_access at a02ae72d [jbd]
 #2 [880281bd7bd0] journal_get_write_access at a02ae899 [jbd]
 #3 [880281bd7bf0] __ext3_journal_get_write_access at
a0327aec [ext3]
 #4 [880281bd7c20] ext3_reserve_inode_write at a0317ef3 [ext3]
 #5 [880281bd7c50] ext3_mark_inode_dirty at a0318f71 [ext3]
 #6 [880281bd7c90] ext3_dirty_inode at a03190f7 [ext3]
 #7 [880281bd7cb0] __mark_inode_dirty at 8117e7e0
 #8 [880281bd7cf0] update_time at 81170c96
 #9 [880281bd7d20] touch_atime at 81170efb
#10 [880281bd7d60] generic_file_aio_read at 810f9e22
#11 [880281bd7e20] aio_rw_vect_retry at 81199bb4
#12 [880281bd7e50] aio_run_iocb at 8119b6c2
#13 [880281bd7e80] io_submit_one at 8119c1f0
#14 [880281bd7ec0] do_io_submit at 8119c3d8
#15 [880281bd7f80] system_call_fastpath at 81464592
RIP: 7f38ad4c36f7  RSP: 7fffc9ee77f0  RFLAGS: 00010206
RAX: 00d1  RBX: 81464592  RCX: 000152012960
RDX: 7fffc9ee77c0  RSI: 0001  RDI: 7f38af06
RBP: 000152012960   R8: 7fffc9ee77b0   R9: 7fffc9ee7750
R10: 7fffc9ee70d0  R11: 0206  R12: 0001553e0f80
R13: 7f38ac571c60  R14: 7fffc9ee77c0  R15: 7fffc9ee77e0
ORIG_RAX: 00d1  CS: 0033  SS: 002b


2016-09-01 7:48 GMT+08:00 Igor Cicimov :
>
>
> On Thu, Sep 1, 2016 at 9:02 AM, Igor Cicimov
>  wrote:
>>
>> On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
>> >
>> > Yes, Oracle & drbd is running under pacemaker just in
>> > primary/secondary mode. I stopped the oracle resource during DRBD is
>> > resyncing and the oracle hangup
>> >
>> > 2016-08-31 14:38 GMT+08:00 Igor Cicimov
>> > :
>> > >
>> > >
>> > > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng 
>> > > wrote:
>> > >>
>> > >> Hi:
>> > >> I have a cluster with four drbd devices. I found oracle stopped
>> > >> timeout while drbd is in resync state.
>> > >> oracle is blocked like following:
>> > >>
>> > >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
>> > >> 00:00:00 pipe_wait
>> > >> /oracle/app/oracle/dbhome_1/bin/sqlplus
>> > >> @/tmp/ora_ommbb_shutdown.sql
>> > >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
>> > >> 00:00:00 get_write_access oracleommbb
>> > >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
>> > >>
>> > >>
>> > >> drbd state
>> > >>
>> > >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
>> > >> =
>> > >> version: 8.3.16 (api:88/proto:86-97)
>> > >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
>> > >> by drbd@build 2012-06-07 16:03:04
>> > >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
>> > >> r-
>> > >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
>> > >> ua:0 ap:31 ep:1 wo:d oos:4144796
>> > >>[==>.] sync'ed: 35.7% (4044/6280)M
>> > >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
>> > >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
>> > >> r-
>> > >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
>> > >> ua:38 ap:0 ep:1 wo:d oos:6204676
>> > >>[===>] sync'ed: 41.5% (6056/10340)M
>> > >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
>> > >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
>> > >> r-
>> > >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
>> > >> ap:7 ep:1 wo:d oos:2124792
>> > >>[>...] sync'ed: 66.3% (2072/6144)M
>> > >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
>> > >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
>> > >> r-
>> > >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0
>> > >> ap:7
>> > >> ep:1 wo:d oos:8131104
>> > >>[>] sync'ed:  1.6% (7940/8064)M
>> > >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
>> > >>
>> > >> Is this a known bug and fixed in the further version?
>> > >> ___
>> > >> drbd-user mailing list
>> > >> drbd-user@lists.linbit.com
>> > >> http://lists.linbit.com/mailman/listinfo/drbd-user
>> > >
>> > >
>> > > Maybe provide more details about the term "cluster" you are using. Do
>>

Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov
On Thu, Sep 1, 2016 at 9:02 AM, Igor Cicimov  wrote:

> On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
> >
> > Yes, Oracle & drbd is running under pacemaker just in
> > primary/secondary mode. I stopped the oracle resource during DRBD is
> > resyncing and the oracle hangup
> >
> > 2016-08-31 14:38 GMT+08:00 Igor Cicimov  >:
> > >
> > >
> > > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng 
> wrote:
> > >>
> > >> Hi:
> > >> I have a cluster with four drbd devices. I found oracle stopped
> > >> timeout while drbd is in resync state.
> > >> oracle is blocked like following:
> > >>
> > >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> > >> 00:00:00 pipe_wait
> > >> /oracle/app/oracle/dbhome_1/bin/sqlplus
> > >> @/tmp/ora_ommbb_shutdown.sql
> > >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> > >> 00:00:00 get_write_access oracleommbb
> > >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> > >>
> > >>
> > >> drbd state
> > >>
> > >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> > >> =
> > >> version: 8.3.16 (api:88/proto:86-97)
> > >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> > >> by drbd@build 2012-06-07 16:03:04
> > >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> > >> ua:0 ap:31 ep:1 wo:d oos:4144796
> > >>[==>.] sync'ed: 35.7% (4044/6280)M
> > >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> > >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> > >> ua:38 ap:0 ep:1 wo:d oos:6204676
> > >>[===>] sync'ed: 41.5% (6056/10340)M
> > >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> > >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> > >> ap:7 ep:1 wo:d oos:2124792
> > >>[>...] sync'ed: 66.3% (2072/6144)M
> > >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> > >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
> > >> ep:1 wo:d oos:8131104
> > >>[>] sync'ed:  1.6% (7940/8064)M
> > >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
> > >>
> > >> Is this a known bug and fixed in the further version?
> > >> ___
> > >> drbd-user mailing list
> > >> drbd-user@lists.linbit.com
> > >> http://lists.linbit.com/mailman/listinfo/drbd-user
> > >
> > >
> > > Maybe provide more details about the term "cluster" you are using. Do
> you
> > > have DRBD under control of crm like Pacemaker? If so are you running
> DRBD in
> > > dual primary mode maybe? And when does this state happen and under what
> > > conditions i.e restarted a node etc.
>
> What os is this on? Can you please paste the output of "crm status" (or
> pcs if you are on rhel7) and "crm_mon -Qrf1"
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
Another thing I forgot  I find it odd that the sync for only one of the
devices is stalled. Are they all using the same replication link? Any
networking issues or network card errors you can see?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov
On 1 Sep 2016 9:02 am, "Igor Cicimov" 
wrote:
>
> On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
> >
> > Yes, Oracle & drbd is running under pacemaker just in
> > primary/secondary mode. I stopped the oracle resource during DRBD is
> > resyncing and the oracle hangup
> >
> > 2016-08-31 14:38 GMT+08:00 Igor Cicimov :
> > >
> > >
> > > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng 
wrote:
> > >>
> > >> Hi:
> > >> I have a cluster with four drbd devices. I found oracle stopped
> > >> timeout while drbd is in resync state.
> > >> oracle is blocked like following:
> > >>
> > >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> > >> 00:00:00 pipe_wait
> > >> /oracle/app/oracle/dbhome_1/bin/sqlplus
> > >> @/tmp/ora_ommbb_shutdown.sql
> > >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> > >> 00:00:00 get_write_access oracleommbb
> > >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> > >>
> > >>
> > >> drbd state
> > >>
> > >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> > >> =
> > >> version: 8.3.16 (api:88/proto:86-97)
> > >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> > >> by drbd@build 2012-06-07 16:03:04
> > >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> > >> ua:0 ap:31 ep:1 wo:d oos:4144796
> > >>[==>.] sync'ed: 35.7% (4044/6280)M
> > >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> > >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> > >> ua:38 ap:0 ep:1 wo:d oos:6204676
> > >>[===>] sync'ed: 41.5% (6056/10340)M
> > >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> > >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> > >> ap:7 ep:1 wo:d oos:2124792
> > >>[>...] sync'ed: 66.3% (2072/6144)M
> > >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> > >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0
ap:7
> > >> ep:1 wo:d oos:8131104
> > >>[>] sync'ed:  1.6% (7940/8064)M
> > >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
> > >>
> > >> Is this a known bug and fixed in the further version?
> > >> ___
> > >> drbd-user mailing list
> > >> drbd-user@lists.linbit.com
> > >> http://lists.linbit.com/mailman/listinfo/drbd-user
> > >
> > >
> > > Maybe provide more details about the term "cluster" you are using. Do
you
> > > have DRBD under control of crm like Pacemaker? If so are you running
DRBD in
> > > dual primary mode maybe? And when does this state happen and under
what
> > > conditions i.e restarted a node etc.
>
> What os is this on? Can you please paste the output of "crm status" (or
pcs if you are on rhel7) and "crm_mon -Qrf1"

Also look for errors from crm in syslog and check oracle log too for errors.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov
On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
>
> Yes, Oracle & drbd is running under pacemaker just in
> primary/secondary mode. I stopped the oracle resource during DRBD is
> resyncing and the oracle hangup
>
> 2016-08-31 14:38 GMT+08:00 Igor Cicimov :
> >
> >
> > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng  wrote:
> >>
> >> Hi:
> >> I have a cluster with four drbd devices. I found oracle stopped
> >> timeout while drbd is in resync state.
> >> oracle is blocked like following:
> >>
> >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> >> 00:00:00 pipe_wait
> >> /oracle/app/oracle/dbhome_1/bin/sqlplus
> >> @/tmp/ora_ommbb_shutdown.sql
> >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> >> 00:00:00 get_write_access oracleommbb
> >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> >>
> >>
> >> drbd state
> >>
> >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> >> =
> >> version: 8.3.16 (api:88/proto:86-97)
> >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> >> by drbd@build 2012-06-07 16:03:04
> >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
> >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> >> ua:0 ap:31 ep:1 wo:d oos:4144796
> >>[==>.] sync'ed: 35.7% (4044/6280)M
> >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
r-
> >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> >> ua:38 ap:0 ep:1 wo:d oos:6204676
> >>[===>] sync'ed: 41.5% (6056/10340)M
> >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
> >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> >> ap:7 ep:1 wo:d oos:2124792
> >>[>...] sync'ed: 66.3% (2072/6144)M
> >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
> >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
> >> ep:1 wo:d oos:8131104
> >>[>] sync'ed:  1.6% (7940/8064)M
> >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
> >>
> >> Is this a known bug and fixed in the further version?
> >> ___
> >> drbd-user mailing list
> >> drbd-user@lists.linbit.com
> >> http://lists.linbit.com/mailman/listinfo/drbd-user
> >
> >
> > Maybe provide more details about the term "cluster" you are using. Do
you
> > have DRBD under control of crm like Pacemaker? If so are you running
DRBD in
> > dual primary mode maybe? And when does this state happen and under what
> > conditions i.e restarted a node etc.

What os is this on? Can you please paste the output of "crm status" (or pcs
if you are on rhel7) and "crm_mon -Qrf1"
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Mia Lueng
Yes, Oracle & drbd is running under pacemaker just in
primary/secondary mode. I stopped the oracle resource during DRBD is
resyncing and the oracle hangup

2016-08-31 14:38 GMT+08:00 Igor Cicimov :
>
>
> On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng  wrote:
>>
>> Hi:
>> I have a cluster with four drbd devices. I found oracle stopped
>> timeout while drbd is in resync state.
>> oracle is blocked like following:
>>
>> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
>> 00:00:00 pipe_wait
>> /oracle/app/oracle/dbhome_1/bin/sqlplus
>> @/tmp/ora_ommbb_shutdown.sql
>> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
>> 00:00:00 get_write_access oracleommbb
>> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
>>
>>
>> drbd state
>>
>> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
>> =
>> version: 8.3.16 (api:88/proto:86-97)
>> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
>> by drbd@build 2012-06-07 16:03:04
>> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
>> ua:0 ap:31 ep:1 wo:d oos:4144796
>>[==>.] sync'ed: 35.7% (4044/6280)M
>>finish: 0:10:19 speed: 6,680 (3,664) K/sec
>> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B r-
>>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
>> ua:38 ap:0 ep:1 wo:d oos:6204676
>>[===>] sync'ed: 41.5% (6056/10340)M
>>finish: 0:22:14 speed: 4,640 (10,016) K/sec
>> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
>> ap:7 ep:1 wo:d oos:2124792
>>[>...] sync'ed: 66.3% (2072/6144)M
>>finish: 0:06:12 speed: 5,692 (6,668) K/sec
>> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
>> ep:1 wo:d oos:8131104
>>[>] sync'ed:  1.6% (7940/8064)M
>>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
>>
>> Is this a known bug and fixed in the further version?
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> Maybe provide more details about the term "cluster" you are using. Do you
> have DRBD under control of crm like Pacemaker? If so are you running DRBD in
> dual primary mode maybe? And when does this state happen and under what
> conditions i.e restarted a node etc.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD9: full-mesh and managed resources

2016-08-31 Thread Roberto Resoli
Il 31/08/2016 15:06, Lars Ellenberg ha scritto:
> Instead of bridging,
> explicit routes could be an other option.
> ip route add .../.. dev ...
> 
> Lars

Already tried, and didn't work for me, I guess that if there are two
interfaces with same IP drbd processes will listen on one only.

Maybe I'm wrong.

rob


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD9: full-mesh and managed resources

2016-08-31 Thread Lars Ellenberg
On Mon, Aug 22, 2016 at 10:43:18AM +0200, Roberto Resoli wrote:
> Il 18/08/2016 14:03, Veit Wahlich ha scritto:
> > Am Donnerstag, den 18.08.2016, 12:33 +0200 schrieb Roberto Resoli:
> >> Il 18/08/2016 10:09, Adam Goryachev ha scritto:
> >>> I can't comment on the DRBD related portions, but can't you add both
> >>> interfaces on each machine to a single bridge, and then configure the IP
> >>> address on the bridge. Hence each machine will only have one IP address,
> >>> and the other machines will use their dedicated network to connect to
> >>> it. I would assume the overhead of the bridge inside the kernel would be
> >>> minimal, but possibly not, so it might be a good idea to test it out.
> >>
> >> Very clever suggestion!
> >>
> >> Many thanks, will try and report.
> > 
> > If you try this, take care to enable STP on the bridges, or this will
> > create loops.
> 
> Yes, this worked immediately as aspected.
> 
> > Also STP will give you redundancy in case a link breaks and will try to
> > determine the shortest path between nodes.
> 
> I confirm. With three nodes and three links of course stp blocks one of
> the three links, with root bridge forwording traffing between the other two.
> 
> It is possible to control which bridge becomes root using the parameter
> "bridgeprio" of brctl.
> 
> > But the shortest link is not guaranteed. Especially after recovery from
> > a network link failure.
> > You might want to monitor each node for the shortest path.
> 
> Using stp of course has the side effect of not using one of the three
> links (it is the price to pay for failover).
> 
> I tried to disable stp, blocking at the same time (with a simple ebtable
> rule) the forwardings through the bridge in order to avoid
> loops/broadcast storms. In the resulting topology every link carries
> only the traffic of the two nodes it connects (at the expense of having
> no failover).
> 
> it is very handy to monitor that all is working correctly using:
> 
> watch brctl showstp 
> 
> and
> 
> watch brctl showmacs 
> 
> I post here the configuration I ended up to use, for reference:
> (I put it in a "drbd-interfaces" file, referenced in
> "/etc/network/interfaces" using the "source" directive)
> 
> ===
> auto drbdbr
> iface drbdbr inet static
> address  
> netmask  255.255.255.0
> bridge_ports eth2 eth3
> bridge_stp off
> bridge_ageing 30
> bridge_fd 5
> # Only with stp on
>   # node1 and node2 are preferred
> #bridge_bridgeprio 1000
> # Only with stp off
>   pre-up ifconfig eth2 mtu 9000 && ifconfig eth3 mtu 9000
> up  ebtables -I FORWARD --logical-in drbdbr -j DROP
> down ebtables -D FORWARD --logical-in drbdbr -j DROP
> ==

Instead of bridging,
explicit routes could be an other option.
ip route add .../.. dev ...

Lars

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov
On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng  wrote:

> Hi:
> I have a cluster with four drbd devices. I found oracle stopped
> timeout while drbd is in resync state.
> oracle is blocked like following:
>
> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> 00:00:00 pipe_wait
> /oracle/app/oracle/dbhome_1/bin/sqlplus
> @/tmp/ora_ommbb_shutdown.sql
> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> 00:00:00 get_write_access oracleommbb
> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
>
>
> drbd state
>
> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> =
> version: 8.3.16 (api:88/proto:86-97)
> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> by drbd@build 2012-06-07 16:03:04
> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> ua:0 ap:31 ep:1 wo:d oos:4144796
>[==>.] sync'ed: 35.7% (4044/6280)M
>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> ua:38 ap:0 ep:1 wo:d oos:6204676
>[===>] sync'ed: 41.5% (6056/10340)M
>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> ap:7 ep:1 wo:d oos:2124792
>[>...] sync'ed: 66.3% (2072/6144)M
>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
> ep:1 wo:d oos:8131104
>[>] sync'ed:  1.6% (7940/8064)M
>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
>
> Is this a known bug and fixed in the further version?
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

Maybe provide more details about the term "cluster" you are using. Do you
have DRBD under control of crm like Pacemaker? If so are you running DRBD
in dual primary mode maybe? And when does this state happen and under what
conditions i.e restarted a node etc.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user