from:"Jim Senicka"

Re: [Veritas-ha] Cluster Interconnect cables: Direct connect orVLANs?

2009-09-16 Thread Jim Senicka

My only addition to the comments by the esteemed gentleman from Virginia is to 
make sure you have a solid practice in place to manage cluster ID when you go 
VLAN, as there may be cases when your network people “cross the streams”

 

 

 

From: veritas-ha-boun...@mailman.eng.auburn.edu 
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Eric Hennessey
Sent: Wednesday, September 16, 2009 11:59 AM
To: Jon Price; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Cluster Interconnect cables: Direct connect orVLANs?

 

The configuration you’re considering – running your cluster interconnects over 
two separate VLANs – is actually our preferred and recommended method, even 
when deploying a simple 2-node cluster.  While using direct connections between 
cluster nodes is simple and convenient, it becomes problematic if you decide to 
add a node to the cluster.

 

Rest easy with your design. :-)

 

Eric

 

From: veritas-ha-boun...@mailman.eng.auburn.edu 
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Jon Price
Sent: Tuesday, September 15, 2009 3:58 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Cluster Interconnect cables: Direct connect or VLANs?

 


Hi,

For Veritas Cluster 5.0 We also have Storage Foundation for Oracle. 

Currently we use direct connect cables between the two nodes in our Veritas 
Cluster for the "heartbeat".
However, we are switching to new systems and running the direct connect cables 
is more difficult than it used to be.
So, we are considering the use of two VLANs for this purpose. I believe that 
traffic on these two VLANs is limited to only Cluster heartbeat connections 
(though not just ours).

What is the downside of using VLANs for the heartbeat?
In what scenarios could problems develop?

I'm concerned that if our network has a serious problem and "goes down" that 
each Node in the Cluster might be isolated and both Nodes import the disk 
groups, mount volumes, etc and thus data corruption.

Is data corruption a possibility if the entire network goes down or in other 
scenarios?

Does Veritas also use "quorum" or any other methods to protect against split 
brain induced  damage?


Thanks

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLT heartbeat redundancy

2009-05-03 Thread Jim Senicka

LLT is designed to use "jeopardy" to detect the difference between
single link fail and dual link fail in most situations. Having a single
mesh may remove this capability.

Let me check on this with engineering and see if we have any more up to
date recommendations


-Original Message-
From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Imri
Zvik
Sent: Sunday, May 03, 2009 12:18 PM
To: Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] LLT heartbeat redundancy

On Sunday 03 May 2009 19:03:16 Jim Senicka wrote:
> This is not a limitation, as you had two independent failures. Bonding
> would remove the ability to discriminate between a link and a node
> failure.

I didn't understand this one - With bonding I can maintain full mesh 
topology - No matter which one of the links fails, if a node still has
at 
least one active link, LLT will still be able to see all the other
nodes. 
This achieves greater HA than without the bonding.


> My feeling is in the scenario you describe, VCS is operating properly,
> and it is not a limitation.

Of course it is operating properly - that's how it was designed to work
:)
I'm just saying that the cluster could be more redundant if it wasn't
designed 
that way :)

> If you have issues with port or cable failures, add a low pri
connection
> on a third network.



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLT heartbeat redundancy

2009-05-03 Thread Jim Senicka

This is not a limitation, as you had two independent failures. Bonding
would remove the ability to discriminate between a link and a node
failure. 
My feeling is in the scenario you describe, VCS is operating properly,
and it is not a limitation.
If you have issues with port or cable failures, add a low pri connection
on a third network.

-Original Message-
From: Imri Zvik [mailto:im...@inter.net.il] 
Sent: Sunday, May 03, 2009 11:57 AM
To: Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] LLT heartbeat redundancy

On Sunday 03 May 2009 18:25:08 Jim Senicka wrote:
> You had 2 failures. No real way to design around that.
> GAB "visible" would prevent bad things from occurring.

Thank you for the fast response :)

Well, In linux I can use the bonding module to aggregate the interfaces
and 
work around this limitation. I've read in this discussion: 
http://www.mail-archive.com/veritas-ha@mailman.eng.auburn.edu/msg01016.h
tml
That since 5.0MP3 there is a cross-platform solution (I need this for
Solaris 
10). Do you happen to know more about this feature?

Thanks!

P.S.

Does anyone knows if Sun Cluster has the same limitation?

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLT heartbeat redundancy

2009-05-03 Thread Jim Senicka

You had 2 failures. No real way to design around that.
GAB "visible" would prevent bad things from occurring.



-Original Message-
From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Imri
Zvik
Sent: Sunday, May 03, 2009 11:20 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] LLT heartbeat redundancy

Hi,

As far as I understand the manuals, the LLT heartbeat links should be
isolated 
from each other. Now, consider the following scenario  - Node1 is
connected 
with two links, each one to a sperate switch. We will call them 
li...@node1@sw1 and li...@node1@sw2.
Node4 is also connected with 2 links: li...@node4@sw1 and
li...@node4@sw2.

Now, if Node1 lose li...@node1@sw1 and Node4 lose li...@node4@sw2, they
can no 
longer see each other, although they still have a valid heartbeat link.

Am I missing something? Is there a way around this issue?

-- imriz
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SUMMARY: filesystem corruption after the cluster nodereboot

2009-04-01 Thread Jim Senicka

Running a non journeled file system in a cluster is always a bad idea,
as your recovery time is always effected by file system start up tasks.
Running UFS in logging mode was usually a pretty big performance hit.
Why not VxFS?



-Original Message-
From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of
Aleksandr Nepomnyashchiy
Sent: Tuesday, March 31, 2009 6:07 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SUMMARY: filesystem corruption after the cluster
nodereboot

Many thanks to Tom Stephens for his help in troubleshooting.

What happened :
Both fs1 and fs2 became corrupted after the node crash. Most probably
VCS tried to FSCK both and was successful with fs1 (size ~4G) and
didn't complete within the timeout period on fs2 (size ~100G). So,
fsck of fs2 was killed and didn't leave anything in the engine_A.log


Suggested actions:
A) Implement UFS logging on both fs1 and fs2 - should eliminate the
file system corruption and the need for FSCK (I will definitely
implement this).
B) Increase the "OnlineTimeout" value for the "Mount" type from the
default of 300 seconds  (this should be considered carefully, can
cause troubles).


PS I was considering adding "-y" in FsckOpt but it doesn't make any
difference - online script adds "-y" option to fsck (regardless of
whether you specify it ot not in the FsckOpt). This is the case for
online script version 2.9 from 02/13/01 18:15:47.



===   Please see the original post below
=



Dear VCS gurus,
Please help me to understand why only 1 out of 2 mount points came up
after the carsh.

I can see in the log that fs1 was fsck-ed by VCS and brought online.
Was fsck even attempted on fs2? And if not why?

VCS is 2.0, both fs1 and fs2 are "ufs",  nothing in FsckOpt.


== engine_A.log from the healthy node =
TAG_E 2009/03/26 18:25:55 (node_d) VCS:13001:Resource(mnt_fs1): Output
of the completed operation (online)
mount: the state of /dev/vx/dsk/mydg/fs1 is not okay
   and it was attempted to be mounted read/write
mount: Please run fsck and try again
** /dev/vx/rdsk/mydg/fs1
** Last Mounted on /mount/fs1
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups

FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX?  yes

7324 files, 2158506 used, 1773622 free (4910 frags, 221089 blocks,
0.1% fragmentation)
TAG_E 2009/03/26 18:25:55 VCS:10298:Resource mnt_fs1 (Owner: unknown,
Group: srvgrA) is online on node_d (VCS initiated)
TAG_E 2009/03/26 18:30:07 (node_d) VCS:13003:Resource(mnt_fs2): Output
of the timedout operation (online)
mount: the state of /dev/vx/dsk/mydg/fs2 is not okay
   and it was attempted to be mounted read/write
mount: Please run fsck and try again
TAG_B 2009/03/26 18:30:07 (node_d) VCS:13012:Resource(mnt_fs2): online
procedure did not complete within the expected time.
TAG_D 2009/03/26 18:30:07 (node_d) VCS:13065:Agent is calling clean
for resource(mnt_fs2) because online did not complete within the
expected time.



Thank you,
Aleksandr
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SFCFSRAC - node with the highest nodeid panics afternode with the lowest nodeid rejoins

2009-03-17 Thread Jim Senicka

What is your gabtab settings?

You seem to have two independent cluster generations.

You should have /sbin/gabconfig -c -n4 in gabtab




-Original Message-
From: Imri Zvik [mailto:im...@inter.net.il] 
Sent: Tuesday, March 17, 2009 10:16 AM
To: Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] SFCFSRAC - node with the highest nodeid panics
afternode with the lowest nodeid rejoins

On Tuesday 17 March 2009 15:32:55 Jim Senicka wrote:
> A few questions
>
> 1. Do you have a support case open?

Yes, for over two weeks.

> 2. Do you reconnect the FC before the node boots?

Yes, FC is reconnected immediately after the panic.

> 3. Is the network available during boot time?

Yes.

>
> "GAB: port b is halting the system due to network failure" essentially
> means that VXFEN is connecting between two clusters with different
> generation numbers, which should only happen if the clusters booted
> independent of each other, then were joined at the network level

This is weird. As you can see from the logs I've attached before, the
cluster 
nodes 1, 2 and 3 were members, and node 0 rejoined them



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SFCFSRAC - node with the highest nodeid panics afternode with the lowest nodeid rejoins

2009-03-17 Thread Jim Senicka

A few questions

1. Do you have a support case open?
2. Do you reconnect the FC before the node boots?
3. Is the network available during boot time?

"GAB: port b is halting the system due to network failure" essentially
means that VXFEN is connecting between two clusters with different
generation numbers, which should only happen if the clusters booted
independent of each other, then were joined at the network level




-Original Message-
From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Imri
Zvik
Sent: Tuesday, March 17, 2009 3:41 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SFCFSRAC - node with the highest nodeid panics
afternode with the lowest nodeid rejoins

Hi,

I have a 4 nodes SFCFSRAC cluster, running on Linux (RHEL 5 x86_64),
with 
SFCFSRAC version 5MP3RP2.

As part of my ATP, I've tried disconnecting node1 from the storage (by 
shutting down it's FC ports at the FC switch). The node paniced, and the

cluster did recognize the node failure, evicted and reconfigured.
After the node paniced, I've restored it's FC connection and booted the
node 
back into the cluster.

To my suprise, while node1 rejoined the cluster, node4 paniced with the 
following message:
"GAB: port b is halting the system due to network failure".

There are no network issues, and this is consistent - each time node1
rejoins 
the cluster after a failure, node4 will panic with the same message.


During the test node3 was the master, and the last events logged on
node4 just 
before the crash are these:

Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20036 Port h gen

6e7c0c membership ;123
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20038 Port h gen

6e7c0c k_jeopardy 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20040 Port h gen

6e7c0cvisible 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20036 Port w gen

6e7c11 membership ;123
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20038 Port w gen

6e7c11 k_jeopardy 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20040 Port w gen

6e7c11visible 0
Mar 15 13:38:01 crmdb-rac-node4 Had[10552]: VCS INFO V-16-1-10077
Received new 
cluster membership
Mar 15 13:38:01 crmdb-rac-node4 Had[10552]: VCS ERROR V-16-1-10113
System 
crmdb-rac-node1 (Node '0') is in DDNA Membership - Membership: 0xe,
Visible: 
0x0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20036 Port d gen

6e7c0d membership ;123
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20038 Port d gen

6e7c0d k_jeopardy 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20040 Port d gen

6e7c0dvisible 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20036 Port f gen

6e7c14 membership ;123
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20038 Port f gen

6e7c14 k_jeopardy 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20040 Port f gen

6e7c14visible 0
Mar 15 13:38:01 crmdb-rac-node4 kernel: GAB INFO V-15-1-20036 Port v gen

6e7c0f membership ;123
Mar 15 13:38:02 crmdb-rac-node4 kernel: GAB INFO V-15-1-20038 Port v gen

6e7c0f k_jeopardy 0
Mar 15 13:38:02 crmdb-rac-node4 kernel: GAB INFO V-15-1-20040 Port v gen

6e7c0fvisible 0
Mar 15 13:38:02 crmdb-rac-node4 kernel: GAB INFO V-15-1-20032 Port a
closed


Any ideas?


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Q: best upgrade procedure for solaris & VCS

2009-02-05 Thread Jim Senicka

You cannot mix VCS versions.

You would need to shut down the 4.0 side, bring up the 5.0 side, bring up the 
service group(s) then upgrade the 4.0 side

 

From: veritas-ha-boun...@mailman.eng.auburn.edu 
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Steffen Winther 
Sørensen
Sent: Thursday, February 05, 2009 6:48 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Q: best upgrade procedure for solaris & VCS

 

Hi

 

I need to upgrade solaris 9->10 for an App running on a VCS 4.0 cluster, what 
will be the best procedure to minimize the downtime for the App?

 

The VCS cluster is a dual node setup with all of the App on shared SAN storage, 
no HB disk only NIC HBs are used. Each node on holds solaris and VCS SW on 
their root disk mirrors.

 

I'm thinking about something like this:

 

- take the inactive node B out of the cluster (hastop -local -evacuate) while 
keep running the App on node A

- on node B do the following:

  * unencapsulate the root disk

  * remove VCS&VxVM

  * upgrade solaris

  * install new VCs&VxVM for solaris 10, eg. 4.1mp1 or 5.0

  * encapsulate root disk

- bring new VCS node B online

- get node B into the cluster with node A (will this work, nodeA VCS 4.0, node 
B VCS 5.0?)

- fail over the App to the new VCS node B (hastop -sys nodeA -evacuate)

- repeat the hole upgrade procedure on node A [and possible fail back to node 
A, just to be sure it works after end of upgrade :]

 

I'm just wandering if...:

1. is it possible to have different versions of VCS on my cluster nodes at one 
point?

2. are there any known issues with VCS 5.0 and VCS 4.0 together?

3. how much config does node B (v.5.0) need to become a member of the v4.0 
cluster?

 

I found this: http://bbs.chinaunix.net/archiver/?tid-748265.html

talking about howto do the upgrade while taking the hole cluster down and leave 
the App up on the current active node.

Only I would like to use the cluster to move the App as fast as possible over 
to the upgraded node B, once that is upgraded, instead of having to manually 
fail the App over.

 

 

dsB:/> pkginfo -l VRTSvcs 

   PKGINST:  VRTSvcs

  NAME:  VERITAS Cluster Server

  CATEGORY:  optional

  ARCH:  sparc

   VERSION:  4.0

   BASEDIR:  /

VENDOR:  VERITAS Software Corp.

  DESC:  VERITAS Cluster Server

PSTAMP:  4.0 01/08/04-22:59:01

  INSTDATE:  Feb 23 2005 08:46

STATUS:  completely installed

 FILES:  150 installed pathnames

  23 shared pathnames

   2 linked files

  43 directories

  74 executables

  151871 blocks used (approx)

 

dsB:/> uname -a

SunOS dsB 5.9 Generic_118558-02 sun4u sparc SUNW,Sun-Fire-480R

 

TIA!

 

/Steffen Winther Soerensen

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Application Resource with Start/Stop butno Monitorscript possible?

2009-02-04 Thread Jim Senicka

Simplest would be to point these resources to use a file in /tmp so it gets 
cleared on server restart

-Original Message-
From: Kubota, Harald (GTS Pac Rim) [mailto:harald_kub...@ml.com] 
Sent: Wednesday, February 04, 2009 8:26 PM
To: Jim Senicka; A Darren Dunham; Veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] Application Resource with Start/Stop butno 
Monitorscript possible?

> But would still get faulted, and cause offline up the dependency tree,
> unless you muck with FaultHandling

Right. The problem is: the application is monitored by the application
owner. We want to get alerts if anything fails though, EXCEPT for those
applications. Since it's a cluster used by many people, the mucking
around would be quite extensive, excluding some application resources
from our monitoring.
Since we want to start up the resources when a group goes online or when
it fails over, this is as it seems a non-trivial issue.

I cannot wait for VCS 6 to fix this :-)

Harald


--
This message w/attachments (message) may be privileged, confidential or 
proprietary, and if you are not an intended recipient, please notify the 
sender, do not use or share it and delete it. Unless specifically indicated, 
this message is not an offer to sell or a solicitation of any investment 
products or other financial product or service, an official confirmation of any 
transaction, or an official statement of Merrill Lynch. Subject to applicable 
law, Merrill Lynch may monitor, review and retain e-communications (EC) 
traveling through its networks/systems. The laws of the country of each 
sender/recipient may impact the handling of EC, and EC may be archived, 
supervised and produced in countries other than the country in which you are 
located. This message cannot be guaranteed to be secure or error-free. 
References to "Merrill Lynch" are references to any company in the Merrill 
Lynch & Co., Inc. group of companies, which are wholly-owned by Bank of America 
Corporation. Secu!
 rities and Insurance Products: * Are Not FDIC Insured * Are Not Bank 
Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to 
Any Banking Service or Activity * Are Not Insured by Any Federal Government 
Agency. Attachments that are part of this E-communication may have additional 
important disclosures and disclaimers, which you should read. This message is 
subject to terms available at the following link: 
http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you 
consent to the foregoing.
--
 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Application Resource with Start/Stop butno Monitor script possible?

2009-02-04 Thread Jim Senicka

But would still get faulted, and cause offline up the dependency tree,
unless you muck with FaultHandling

-Original Message-
From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of A Darren
Dunham
Sent: Wednesday, February 04, 2009 3:38 PM
To: Veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Application Resource with Start/Stop butno
Monitor script possible?

On Tue, Feb 03, 2009 at 01:47:34PM +0900, Kubota, Harald (GTS Pac Rim)
wrote:
> Is it possible to use an application resource with a start and stop
> script and no monitor script and VCS will trust the start script and
> believe the application is running although it's actually not? This
> might make VCS report a wrong status (ONLINE) for the resource
although
> it's dead, but this is ok.

Couldn't you just set the resources to non-critical?  They'd still be
monitored, but it wouldn't cause a failover.

-- 
Darren
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Removing VCS group

2009-02-02 Thread Jim Senicka

No. It means you do not have to do that. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   i man [mailto:imanuk2...@googlemail.com]
Sent:   Monday, February 02, 2009 08:07 AM US Mountain Standard Time
To: Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] Removing VCS group

Jim,
does it mean that I would need to do the same activity of removing diskgroup
coponents and putting them into spare pool on both the parts of cluster ?
Ciao.

On Mon, Feb 2, 2009 at 2:15 PM, Jim Senicka wrote:

> The diskgroup is destroyed. All info about a VxVM diskgroup is in the dg,
> so no need to do anything else (no info is on the host).
>
> In straight failover VxVM, the only tie point between VCS and VxVM is the
> VCS agent that imports and deports specified diskgroups. VxVM has no
> knowledge of VCS and VCS really only knows the name of a DG it is supposed
> to manage.
>
>
>
>
> Sent from my Nokia E62 handheld by goodlink.
>
>
>  -Original Message-
> From:   i man [mailto:imanuk2...@googlemail.com]
> Sent:   Monday, February 02, 2009 06:18 AM US Mountain Standard Time
> To: veritas-ha@mailman.eng.auburn.edu
>  Subject:Re: [Veritas-ha] Removing VCS group
>
> Thankyou to all for your help.
>
> Now I have some queries regarding the cluster.
>
> I have imported and destroyed the diskgroup on one system of the cluster.
>
> 1. Do I have to do it on both the systems of the cluster ?
>
> We have multipathing enabled on the systems.
>
> # vxdmpadm listctlr all
> CTLR-NAME   ENCLR-TYPE  STATE  ENCLR-NAME
> =
> c7  EMC ENABLED  MC0
> c6  EMC ENABLED  MC0
> c0  DiskENABLED  Disk
> c7  EMC ENABLED  MC1
> c6  EMC ENABLED  MC1
>  I am still a little confused as to the integration of the vxvm and vcs.
> Can
> somebody send me some link as well which shows how they are constructed
> together so that I have better understanding.
>
> Ciao.
>
>
> On Fri, Jan 30, 2009 at 11:29 AM, Jim Senicka  >wrote:
>
> > Removal of the service group has zero effect on the storage. You need to
> > use appropriate VxVM commands to manage the disk group. The vxprint
> command
> > is VxVM and has nothing to do with VCS.
> > Removing the service group was fine. Now you need to complete the VxVM
> > work.
> >
> >
> > Sent from my Nokia E62 handheld by goodlink.
> >
> >
> >  -Original Message-
> > From:   i man [mailto:imanuk2...@googlemail.com]
> > Sent:   Friday, January 30, 2009 04:26 AM US Mountain Standard Time
> > To: veritas-ha@mailman.eng.auburn.edu
> > Subject:[Veritas-ha] Removing VCS group
> >
> > all,
> >
> > I think I'm i bit of trouble.
> >
> > Im trying to remove a cluster service gorup which has a veritas disk
> group
> > configured . My task is to free up the disks used by the removel of SG
> and
> > DG and move them to free pool. From the cluster GUI. I have removed the
> > resources, the SG. some questions regarding the same...
> >
> >
> >   1. Did the Veritas disk group got deleted automatically when I removed
> >   the cluster component s?
> >   2. I could not see any service group thorugh vxprint command now.
> >   3. How could I now move the disks from the service gorup pool ?
> >
> > Ciao.
> >
>

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Removing VCS group

2009-02-02 Thread Jim Senicka

The diskgroup is destroyed. All info about a VxVM diskgroup is in the dg, so no 
need to do anything else (no info is on the host). 

In straight failover VxVM, the only tie point between VCS and VxVM is the VCS 
agent that imports and deports specified diskgroups. VxVM has no knowledge of 
VCS and VCS really only knows the name of a DG it is supposed to manage. 

Sent from my Nokia E62 handheld by goodlink.

 -Original Message-
From:   i man [mailto:imanuk2...@googlemail.com]
Sent:   Monday, February 02, 2009 06:18 AM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] Removing VCS group

Thankyou to all for your help.

Now I have some queries regarding the cluster.

I have imported and destroyed the diskgroup on one system of the cluster.

1. Do I have to do it on both the systems of the cluster ?

We have multipathing enabled on the systems.

# vxdmpadm listctlr all
CTLR-NAME   ENCLR-TYPE  STATE  ENCLR-NAME
=
c7  EMC ENABLED  MC0
c6  EMC ENABLED  MC0
c0  DiskENABLED  Disk
c7  EMC ENABLED  MC1
c6  EMC ENABLED  MC1
 I am still a little confused as to the integration of the vxvm and vcs. Can
somebody send me some link as well which shows how they are constructed
together so that I have better understanding.

Ciao.

On Fri, Jan 30, 2009 at 11:29 AM, Jim Senicka wrote:

> Removal of the service group has zero effect on the storage. You need to
> use appropriate VxVM commands to manage the disk group. The vxprint command
> is VxVM and has nothing to do with VCS.
> Removing the service group was fine. Now you need to complete the VxVM
> work.
>
>
> Sent from my Nokia E62 handheld by goodlink.
>
>
>  -Original Message-
> From:   i man [mailto:imanuk2...@googlemail.com]
> Sent:   Friday, January 30, 2009 04:26 AM US Mountain Standard Time
> To: veritas-ha@mailman.eng.auburn.edu
> Subject:[Veritas-ha] Removing VCS group
>
> all,
>
> I think I'm i bit of trouble.
>
> Im trying to remove a cluster service gorup which has a veritas disk group
> configured . My task is to free up the disks used by the removel of SG and
> DG and move them to free pool. From the cluster GUI. I have removed the
> resources, the SG. some questions regarding the same...
>
>
>   1. Did the Veritas disk group got deleted automatically when I removed
>   the cluster component s?
>   2. I could not see any service group thorugh vxprint command now.
>   3. How could I now move the disks from the service gorup pool ?
>
> Ciao.
>

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Removing VCS group

2009-01-30 Thread Jim Senicka

Removal of the service group has zero effect on the storage. You need to use 
appropriate VxVM commands to manage the disk group. The vxprint command is VxVM 
and has nothing to do with VCS. 
Removing the service group was fine. Now you need to complete the VxVM work. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   i man [mailto:imanuk2...@googlemail.com]
Sent:   Friday, January 30, 2009 04:26 AM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] Removing VCS group

all,

I think I'm i bit of trouble.

Im trying to remove a cluster service gorup which has a veritas disk group
configured . My task is to free up the disks used by the removel of SG and
DG and move them to free pool. From the cluster GUI. I have removed the
resources, the SG. some questions regarding the same...


   1. Did the Veritas disk group got deleted automatically when I removed
   the cluster component s?
   2. I could not see any service group thorugh vxprint command now.
   3. How could I now move the disks from the service gorup pool ?

Ciao.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS Configuration 1

2009-01-21 Thread Jim Senicka

Also,

The entry points for any given resource type are called by the agent
framework of that particular agent.

So if your built an ABRA agent, it would call the entry points based on
the desired state of a given ABRA resource.

If no ABRA resource defined, no entry points will ever be called

 

From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Jim
Senicka
Sent: Wednesday, January 21, 2009 4:21 PM
To: i man; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] VCS Configuration 1

 

Comments below with JS>>>

 

From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of i man
Sent: Wednesday, January 21, 2009 2:36 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS Configuration 1

 

All,

 

I am trying to configure a VCS resource. Have some confusion regarding
the below points.

 

1)  I have created the online, offline, monitor, clean scripts. Can
anybody explain me how these scripts are called by VCS. 

JS>>> They are called when a resource of that type is configured and
needs to do specific state changes. Once you have your type definition,
and an ABRAAgent in the ABRA directory, you then need to create a
resource of that type in the main.cf

 

 

I know I have defined the .cf file for the application but it seems they
are not called from the ArgList parameter. I cross checked with other
running applications as well and seems they are not called anywhere.
Does arglist by default call these scripts ?  I mean I was expecting
these scripts to be called at any configuration file.

JS>>> Huh? Sorry. The entry point scripts for a resource type are only
called when a resource of that type needs to be controlled. So unless
you have a resource of tyoe Oracle defined in a service group, the
Oracle entry points are not called. For that matter, the OracleAgent is
not even started

 

My .cf file looks like below

 

# more ABRA.cf
type ABRA (
static int RestartLimit = 2
static str ArgList[] = { Vandalhome, stupiduser }
str Vandalhome
str stupiduser 
)

2)  One of the reasons for the above question is , I think all my
applications do not have a proper Clean procedure as the clean script
does not make sense to me. Just wanted to check if Clean is implemented
on the system or not.

 

 JS>>> You would have had to have implemented the clean procedure

 

3)  What agent attribute would be best suited for the resource to
wait for specific interval of time before starting the procedure.

The numerical value returned by online or offline sets the number of
seconds before monitor is called

 

Im toying with following attributes.

 

OnlineWaitLimit : But SADG Says "Number of monitor intervals to wait
after completing the online procedure, and before the resource becomes
online." : My requirement is to wait before starting the online
procedure

 

OnlineTimeout: Not convinced with this as well

 

ConfInterval: Had seen the implementation of this parameter in main.cmd
so not happy about it.

 

Ciao

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS Configuration 1

2009-01-21 Thread Jim Senicka

Comments below with JS>>>

From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of i man
Sent: Wednesday, January 21, 2009 2:36 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS Configuration 1

All,

I am trying to configure a VCS resource. Have some confusion regarding
the below points.

1)  I have created the online, offline, monitor, clean scripts. Can
anybody explain me how these scripts are called by VCS. 

JS>>> They are called when a resource of that type is configured and
needs to do specific state changes. Once you have your type definition,
and an ABRAAgent in the ABRA directory, you then need to create a
resource of that type in the main.cf

I know I have defined the .cf file for the application but it seems they
are not called from the ArgList parameter. I cross checked with other
running applications as well and seems they are not called anywhere.
Does arglist by default call these scripts ?  I mean I was expecting
these scripts to be called at any configuration file.

JS>>> Huh? Sorry. The entry point scripts for a resource type are only
called when a resource of that type needs to be controlled. So unless
you have a resource of tyoe Oracle defined in a service group, the
Oracle entry points are not called. For that matter, the OracleAgent is
not even started

My .cf file looks like below

# more ABRA.cf
type ABRA (
static int RestartLimit = 2
static str ArgList[] = { Vandalhome, stupiduser }
str Vandalhome
str stupiduser 
)

2)  One of the reasons for the above question is , I think all my
applications do not have a proper Clean procedure as the clean script
does not make sense to me. Just wanted to check if Clean is implemented
on the system or not.

 JS>>> You would have had to have implemented the clean procedure

3)  What agent attribute would be best suited for the resource to
wait for specific interval of time before starting the procedure.

The numerical value returned by online or offline sets the number of
seconds before monitor is called

Im toying with following attributes.

OnlineWaitLimit : But SADG Says "Number of monitor intervals to wait
after completing the online procedure, and before the resource becomes
online." : My requirement is to wait before starting the online
procedure

OnlineTimeout: Not convinced with this as well

ConfInterval: Had seen the implementation of this parameter in main.cmd
so not happy about it.

Ciao

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SRDF agent for cascaded SRDF in global cluster

2008-12-24 Thread Jim Senicka

The SRDF replication control agent for VCS HA/DR does not currently
support cascaded SRDF. It only supports STAR.

We are looking at adding cascaded, but no official support at this time,
and no committed date for cascade support.
Speak with your Symantec rep?

 

From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Pavel A
Tsvetkov
Sent: Tuesday, December 23, 2008 6:09 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SRDF agent for cascaded SRDF in global cluster

 


Hello all! 

Just a  small question. The new SRDF agent 5.0.0.4 has a support for
SRDF STAR. This is a good thing. 
But what about   cascaded SRDF for Symmetrix DMX  version 5773 and
above? If we use R1 and R12 on one site (Replicated Data Cluster
R1->R12) and 
R2 on another site in global cluster  (R12 -> R2) is it possible to use
SRDF VCS agent in that case? I don't see reasons why this agent cannot
be used  


Kind regards, Pavel Tsvetkov

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Metro/Global cluster solution options

2008-12-24 Thread Jim Senicka

Talk with your Symantec rep?

The System Engineer can easily come in and discuss how VCS can manage
your DR Automation needs

 

 

From: veritas-ha-boun...@mailman.eng.auburn.edu
[mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of rajesh
Kharya (rkharya)
Sent: Wednesday, December 24, 2008 5:15 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Metro/Global cluster solution options

 

Hi,

 

We are evaluating possible clustering solution for one project where
entire application environment will be hosted in 2 Data centers, some 50
miles apart. The application environment will be identical on both the
DCs and will be accessed via Global Site Selector/Application Control
Engines at the network layer. At the very back end we have a requirement
of putting a 2 node cluster in each data center on Linux OS preferably.
Within a DC one node will be active while the other will be passive.
Storage will be configured as mounted file systems. We need to know in
what way VCS can help in -

 

A) data replication between the clusters in 2 DCs, assuming the two
clusters are working independently.

B) Is there a possibility of having all nodes(1-4) part of a single
cluster, where they are separated out by 50 miles and have common
storage between them(possibly CFS implementation). Node1/3 remains
active while Node2/4 are standbys. 

 

 

 Site ASite B

--

 

Node1Node3

|>| 

Node2   Node4

 

 

Any pointers to references/documentations appreciated.

 

Thanks,

 

~ Rajesh.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] hagrp -switch

2008-11-05 Thread Jim Senicka

It depends on your application. What would your application do if you shut it 
down from the command line and moved it?
VCS is 100% orthogonal to this discussion, as it is just executing the 
commands. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   upen [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, November 05, 2008 05:23 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] hagrp -switch

Hi ,

VCS version is 3.5 and OS is solaris 9

just a basic question, I am going  to do following command on one of
the cluster nodes to switch application service group to other node.
Application SG includes resources like SAN disks, http, tomcat, oracle
listener/DB

# hagrp -switch service_group -to system

I am wondering what happens to the users who are logged into the
web-site already and writing to a forum may be..: do they have to
re-login? and do they lose all data while writing a post in
application based discussion forum?

Please let me know,

Thanks

--
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Question about HA and disks

2008-10-28 Thread Jim Senicka

Ok
Looks like I was wrong here. Pluto is not dying a clean death.
I/O fencing may very well have prevented this issue

-Original Message-
From: Andrey Dmitriev [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 28, 2008 2:56 AM
To: Jon E Price/SYS/NYTIMES; Jim Senicka
Cc: Joshua Fielden; veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] Question about HA and disks

Sure,

Don't actually know what's causing this.. We've found that when the NFS
server fails, one of the db servers pukes. (NFS is used for backups)

It's not actually clear if pluto crashed from the logs.. It's almost as
the HB somehow went away (not sure how it's possible, heartbeats go
across 2 hubs), and _then_ it crashed (when the other server took over
the resources)

And per other posts, yes I'll look into io fencing (forgot about the
feature, thnx for reminding), and yes, we are thinking of upgrading to
RH5 with kdump (crashdump)

Oct 18 13:42:45 pluto Had[6220]: VCS WARNING V-16-1-51047 HAD Self
Check: Excessive delay in the HAD heartbeat to GAB (10 seconds)
Oct 18 13:42:45 pluto Had[6220]: VCS WARNING V-16-1-53024 HAD Signal
SIGABRT received
Oct 18 13:42:45 pluto Had[6220]: VCS NOTICE V-16-1-53028 Beginning
execution of the diagnostics script
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 7 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 8 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 9 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 10 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 11 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 12 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 13 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20057 Port h process
6220 inactive 14 sec
Oct 18 13:42:45 pluto kernel: GAB WARNING V-15-1-20058 Port h process
6220: heartbeat failed, killing process
Oct 18 13:42:45 pluto kernel: GAB INFO V-15-1-20059 Port h heartbeat
interval 15000 msec. Statistics:
Oct 18 13:42:45 pluto kernel: GAB INFO V-15-1-20129 Port h: heartbeats
in 0 ~ 3000 msec: 87923248
Oct 18 13:42:45 pluto kernel: GAB INFO V-15-1-20129 Port h: heartbeats
in 3000 ~ 6000 msec: 0
Oct 18 13:42:45 pluto kernel: GAB INFO V-15-1-20129 Port h: heartbeats
in 6000 ~ 9000 msec: 0
Oct 18 13:42:45 pluto kernel: GAB INFO V-15-1-20129 Port h: heartbeats
in 9000 ~ 12000 msec: 0
Oct 18 14:05:52 pluto kernel: GAB INFO V-15-1-20129 Port h: heartbeats
in 12000 ~ 15000 msec: 0
Oct 18 14:05:52 pluto kernel: GAB INFO V-15-1-20041 Port h: client
process failure: killing process
Oct 18 14:05:52 pluto kernel: 00 0  2283  1  2339
1564 (NOTLB)
Oct 18 14:05:52 pluto kernel: 0106f3467b98 0002
010527635800 0246
Oct 18 14:05:52 pluto AgentFramework[6681]: VCS ERROR V-16-1-13027
Thread(4017064880) Resource(hok2_listener) - monitor procedure did not
complete within the expected time.
Oct 18 14:05:52 pluto kernel:0246 8029eb47
0105fd042080 2d877320
Oct 18 14:05:52 pluto AgentFramework[6681]: VCS ERROR V-16-1-13027
Thread(4006575024) Resource(str2_listener) - monitor procedure did not
complete within the expected time.

And the other node (vcs log)

2008/10/18 13:43:06 VCS INFO V-16-1-10077 Received new cluster
membership
2008/10/18 13:43:06 VCS NOTICE V-16-1-10080 System (sun) - Membership:
0xc, Jeopardy: 0x0
2008/10/18 13:43:06 VCS ERROR V-16-1-10079 System pluto (Node '1') is in
Down State - Membership: 0xc
2008/10/18 13:43:06 VCS ERROR V-16-1-10322 System pluto (Node '1')
changed state from RUNNING to FAULTED
2008/10/18 13:43:06 VCS NOTICE V-16-1-10446 Group pluto_gp is offline on
system pluto
2008/10/18 13:43:06 VCS INFO V-16-1-10493 Evaluating pluto as potential
target node for group pluto_gp
2008/10/18 13:43:06 VCS INFO V-16-1-10494 System pluto not in RUNNING
state
2008/10/18 13:43:06 VCS INFO V-16-1-10493 Evaluating sun as potential
target node for group pluto_gp
2008/10/18 13:43:06 VCS INFO V-16-1-10493 Evaluating mars as potential
target node for group pluto_gp
2008/10/18 13:43:06 VCS NOTICE V-16-1-10301 Initiating Online of
Resource plutodg (Owner: unknown, Group: pluto_gp) on System sun
2008/10/18 13:43:06 VCS NOTICE V-16-1-10301 Initiating Online of
Resource orapluto (Owner: unknown, Group: pluto_gp) on System sun
2008/10/18 13:43:06 VCS INFO V-16-6-15004 (mars) hatrigger:Failed to
send trigger for sysoffline; script doesn't exist
2008/10/18 13:43:06 VCS NOTICE V-16-10031-1514 (sun)
DiskGroup:plutodg:online:Diskgroups will be imported without
reservations.
2008/10/18 13:43:07 VCS WARNING V-16-10031-1516 (sun)
DiskGroup:plutodg:online:Trying force import for the diskgroup.
2008/10/18 13:43:07 VCS WARNING V-16-

Re: [Veritas-ha] Question about HA and disks

2008-10-27 Thread Jim Senicka

While I think fencing is always the right choice, I still think this was
a system issue. The system stopped heart beating for 16 seconds, plus
the 5 seconds gab stable time out. At this point, VCS failed over.
Fencing would not have been in play until the import on the second node.
So if the corruption happened during the 21 seconds, it would not have
helped.
If there is a case where the node is "nearly dead" for an extended
period of time, not capable of kernel level heartbeat from LLT, but is
still writing to disk, then by all means you need I/O fencing to protect
you from the OS.





-Original Message-
From: Brad Boyer 
Sent: Monday, October 27, 2008 8:57 PM
To: Jim Senicka; Jon E Price/SYS/NYTIMES; Andrey Dmitriev; Joshua
Fielden; veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] Question about HA and disks

Based on the original description, I would presume that the system did
not actually panic immediately. I've seen Linux systems oops without
immediate panics many times. I would make no assumption of what the
dying system was doing in this case without real evidence, especially
not that it actually got as far as a panic. Linux is not UNIX (it's just
unofficially POSIX compliant), and you shouldn't make the assumption
that Linux will act like UNIX (it definitely acts different in quite a
few ways). Seeing as this is RHEL4, this system probably isn't even
capable of taking a crash dump, and thus would be unlikely to be taking
time writing a crash dump as opposed to doing some damage to the data on
disk. Even with the current Red Hat release (RHEL5) crash dumps aren't
enabled by default.

My suggestion is that using I/O fencing would be the right answer here.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim
Senicka
Sent: Monday, October 27, 2008 5:21 PM
To: Jon E Price/SYS/NYTIMES; Andrey Dmitriev; Joshua Fielden;
veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Question about HA and disks

In the original message
" We had an issue where a serverA failed and serverB took over.
However, serverB took over when serverA was still 'crashing' (it took a
good 10-15mins to crash),"

I can assume crash = panic, as "crashing" has to refer to dumping core
to disk.

If this is the case, there will be no logs on server A, as it is mid
panic.

In this case (the node is in the middle of a crash dump), it will not be
writing to data disks. What ever was written happened before the kernel
call to panic. Fencing will protect that data once the new node imports,
but in the case described here, the corruption had to happen before the
panic, so fence would not have helped.

Bottom line is the node ceased writing as soon as the non maskable
interrupt was called for panic (unless Linux somehow violates every Unix
kernel rule, which I seriously doubt). When VCS took over the service
group on Server B, Server A was down and could not have been writing


-Original Message-
From: Jon E Price/SYS/NYTIMES [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 27, 2008 8:14 PM
To: Jim Senicka; Andrey Dmitriev; Joshua Fielden;
veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Question about HA and disks

Hi,

A few questions..

Andrey: Could you post the logs (or even portions of them) which show
what
ServerA was doing during the takeover?

Joshua: You're saying that IO Fencing can prevent split brain situations
in
which one server is still writing to a filesystem while a 2nd server has
taken over that same service group and begun writing to the same fs,
thus
possibly causing corruption?

http://sfdoccentral.symantec.com/sf/5.0/linux/html/vcs_install/ch_vcs_in
stall_iofence.html#190559

Jim: What's the evidence that the server panic'd?
And is 16 seconds the default for the heartbeat failure?


Jon






 

 "Jim Senicka"

 <[EMAIL PROTECTED]

 mantec.com>
To 
 Sent by:  "Andrey Dmitriev"

 veritas-ha-bounce <[EMAIL PROTECTED]>,

 [EMAIL PROTECTED]
 
 urn.edu
cc 
 

 
Subject 
 10/27/2008 07:19  Re: [Veritas-ha] Question about
HA  
 PMand disks

 

 

 

 

 

 







When a server panics, it stops writing to anything but the dump device.
VCS did exactly as designed. 16 seconds after heartbeat failure it
started takeover. Whatever was damaged on your file system was already
damaged at that point, regardless how long it took to dump core to the
dump device. I would look at the cause of the panic, and it is likely it
was something to do with what garbaged your FS


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Monday, October 27, 2008 2:01 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha]

Re: [Veritas-ha] Question about HA and disks

2008-10-27 Thread Jim Senicka

In the original message
" We had an issue where a serverA failed and serverB took over.
However, serverB took over when serverA was still 'crashing' (it took a
good 10-15mins to crash),"

I can assume crash = panic, as "crashing" has to refer to dumping core
to disk.

If this is the case, there will be no logs on server A, as it is mid
panic.

In this case (the node is in the middle of a crash dump), it will not be
writing to data disks. What ever was written happened before the kernel
call to panic. Fencing will protect that data once the new node imports,
but in the case described here, the corruption had to happen before the
panic, so fence would not have helped.

Bottom line is the node ceased writing as soon as the non maskable
interrupt was called for panic (unless Linux somehow violates every Unix
kernel rule, which I seriously doubt). When VCS took over the service
group on Server B, Server A was down and could not have been writing


-Original Message-
From: Jon E Price/SYS/NYTIMES [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 27, 2008 8:14 PM
To: Jim Senicka; Andrey Dmitriev; Joshua Fielden;
veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Question about HA and disks

Hi,

A few questions..

Andrey: Could you post the logs (or even portions of them) which show
what
ServerA was doing during the takeover?

Joshua: You're saying that IO Fencing can prevent split brain situations
in
which one server is still writing to a filesystem while a 2nd server has
taken over that same service group and begun writing to the same fs,
thus
possibly causing corruption?

http://sfdoccentral.symantec.com/sf/5.0/linux/html/vcs_install/ch_vcs_in
stall_iofence.html#190559

Jim: What's the evidence that the server panic'd?
And is 16 seconds the default for the heartbeat failure?


Jon






 

 "Jim Senicka"

 <[EMAIL PROTECTED]

 mantec.com>
To 
 Sent by:  "Andrey Dmitriev"

 veritas-ha-bounce <[EMAIL PROTECTED]>,

 [EMAIL PROTECTED]
 
 urn.edu
cc 
 

 
Subject 
 10/27/2008 07:19  Re: [Veritas-ha] Question about
HA  
 PMand disks

 

 

 

 

 

 







When a server panics, it stops writing to anything but the dump device.
VCS did exactly as designed. 16 seconds after heartbeat failure it
started takeover. Whatever was damaged on your file system was already
damaged at that point, regardless how long it took to dump core to the
dump device. I would look at the cause of the panic, and it is likely it
was something to do with what garbaged your FS


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Monday, October 27, 2008 2:01 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Question about HA and disks

We had an issue where a serverA failed and serverB took over.
However, serverB took over when serverA was still 'crashing' (it took a
good 10-15mins to crash), and apparently still had a hold of file
systems (system logs confirm that takeover occurred while serverA was
still 'puking').
The file systems on ServerB came up corrupt, and we lost some data b/c
of that.
HA is setup via heartbeats. File system is vxfs, OS is RedHat 4.0.
Is there are any way to avoid that?

Thanks,
Andrey

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha




___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Question about HA and disks

2008-10-27 Thread Jim Senicka

When a server panics, it stops writing to anything but the dump device.
VCS did exactly as designed. 16 seconds after heartbeat failure it
started takeover. Whatever was damaged on your file system was already
damaged at that point, regardless how long it took to dump core to the
dump device. I would look at the cause of the panic, and it is likely it
was something to do with what garbaged your FS


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Monday, October 27, 2008 2:01 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Question about HA and disks

We had an issue where a serverA failed and serverB took over.
However, serverB took over when serverA was still 'crashing' (it took a
good 10-15mins to crash), and apparently still had a hold of file
systems (system logs confirm that takeover occurred while serverA was
still 'puking').
The file systems on ServerB came up corrupt, and we lost some data b/c
of that.
HA is setup via heartbeats. File system is vxfs, OS is RedHat 4.0.
Is there are any way to avoid that?

Thanks,
Andrey

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !?

2008-10-22 Thread Jim Senicka

Cut and paste main.cf for the service group in question?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pascal
Grostabussiat
Sent: Tuesday, October 21, 2008 9:11 AM
Cc: Veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !?

To Jim, Scott and Gene.

Jim Senicka wrote:
> Is the disk group agent running on the systems?
>   
Yes it is:

root 16295 1   0 16:16:01 ?   1:29 
/opt/VRTSvcs/bin/DiskGroup/DiskGroupAgent -type DiskGroup

> Has the cluster been started since you created the service group
> definition?
>   
Yes. I restarted VCS hopping it might somehow change something, but no. 
I am thinking about rebooting one server.
> Are all resources enabled in the service groups?
>   
Yes. I tried to disable them and re-enable them. But I come back to the 
same situation.


Scott3, James wrote:
> Have you made sure the volumes are ENABLED ACTIVE?  Can you send a
> vxprint on the group?  Is it a shared group or a active/passive group?
> Also send a vxdg list. 
Enabled and active, yes. The disk-group is active/passive (to be mounted

on one host at a time).

bash-3.00# vxprint -l dba_DG
Disk group: dba_DG

Group:dba_DG
info: dgid=1224062934.119.
version:  140
alignment: 8192 (bytes)
detach-policy: global
dg-fail-policy: dgdisable
copies:   nconfig=default nlog=default
devices:  max=32767 cur=3
minors:   >= 62000
cds=on

bash-3.00# vxprint -g dba_DG
TY NAME ASSOCKSTATE   LENGTH   PLOFFS   STATETUTIL0

PUTIL0
dg dba_DG   dba_DG   -----
-

dm dba_DG01 c0t216000C0FF87E774d10s2 - 525417536 -  --
-

v  dba_archive  fsgenENABLED  20971520 -ACTIVE   -
-
pl dba_archive-01 dba_archive ENABLED 20971520 -ACTIVE   -
-
sd dba_DG01-02  dba_archive-01 ENABLED 20971520 0   --
-

v  dba_data fsgenENABLED  104857600 -   ACTIVE   -
-
pl dba_data-01  dba_data ENABLED  104857600 -   ACTIVE   -
-
sd dba_DG01-03  dba_data-01  ENABLED  104857600 0   --
-

v  dba_redo fsgenENABLED  20971520 -ACTIVE   -
-
pl dba_redo-01  dba_redo ENABLED  20971520 -ACTIVE   -
-
sd dba_DG01-01  dba_redo-01  ENABLED  20971520 0--
-

bash-3.00# vxdg list
NAME STATE   ID
xxx_DG  enabled,cds  1224062531.89.
xxx_DG  enabled,cds  1224062634.101.
xxx_DG  enabled,cds  1224062699.109.
dba_DG   enabled,cds  1224062934.119.
xxx_DG   enabled,cds  1224062443.81.
xxx_DGenabled,cds  1224062569.93.
xxx_DG  enabled,cds  1224062672.105.
xxx_DG  enabled,cds  1224062491.85.

Gene Henriksen wrote:
> If you have a "?" in the GUI, then it cannot probe the resource on one
> system or the other. It will not import on either until it is probed
on
> both. This is to avoid a concurrency violation.
>   
Yes. Fully agree.
> Hold the cursor on the resource and a pop-up box should show the
status
> so you can see where it is not probed.
>   
Status is "unkown" on both server A and B.
> This could be due to one system having never seen the DG. Can you run
> vxdisk -o alldgs list and see the DG on both systems?
>   
I can import/deport that disk-group using vxdg without a problem

bash-3.00# vxdisk -o alldgs list
DEVICE   TYPEDISK GROUPSTATUS
c0t216000C0FF87E774d0s2 auto:none   --online

invalid
c0t216000C0FF87E774d1s2 auto:cdsdiskxxx_DG01  xxx_DG   online
c0t216000C0FF87E774d2s2 auto:cdsdiskxxx_DG01xxx_DG  online
c0t216000C0FF87E774d3s2 auto:cdsdiskxxx_DG01xxx_DG  online
c0t216000C0FF87E774d4s2 auto:cdsdiskxxx_DG01  xxx_DGonline
c0t216000C0FF87E774d5s2 auto:cdsdisk-(xxx_DG)online
c0t216000C0FF87E774d6s2 auto:cdsdiskxxx_DG01xxx_DG  online
c0t216000C0FF87E774d7s2 auto:cdsdiskxxx_DG01xxx_DG  online
c0t216000C0FF87E774d8s2 auto:cdsdiskxxx_DG01  xxx_DG  online
c0t216000C0FF87E774d9s2 auto:cdsdisk-(xxx_DG)   online
c0t216000C0FF87E774d10s2 auto:cdsdiskdba_DG01 dba_DG
online
c2t0d0s2 auto:none   --online invalid
c2t2d0s2 auto:none   --online invalid
c2t3d0s2 auto:none   --online invalid

> The other possibility is a typo in the DiskGroup resource attribute.
> Make sure it has no leading spaces, is the correct case (just like
> vxdisk list shows it.
I thought about this and double-checked. Nothing. I recreated the 
resource and paid attention to such possibility, nothing.

Regards,
/Pascal

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/ve

Re: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !?

2008-10-21 Thread Jim Senicka

Is the disk group agent running on the systems?
Has the cluster been started since you created the service group
definition?
Are all resources enabled in the service groups?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pascal
Grostabussiat
Sent: Tuesday, October 21, 2008 7:09 AM
To: Veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !?

Hi,

I have experiencing a weird issue since yesterday and I cannot get that 
solve buy surfing and checking around. So I hope to get a hint using the

mailing-list.

Our sysadmin recently installed a system with two Sun SPARC for me with 
VxVM, VxFS and VCS. In short I have VERITAS Foundation 5.0 with MP1.

  DESC:  Veritas Cluster Server by Symantec
PSTAMP:  Veritas-5.0MP1-11/29/06-17:15:00

  DESC:  Virtual Disk Subsystem
PSTAMP:  Veritas-5.0-MP1.26:2007-02-28

  DESC:  Commercial File System
PSTAMP:  VERITAS-FS-5.0.1.0-2007-01-17-5.0MP1=123202-02

Now I have an issue with all disk-groups, like for example dba_DG. Using

the command line or the VERITAS Enterprise Administrator I can 
import/deport the disk-group, I can mount the corresponding volumes and 
creates file-systems on them. No issue there.

Now I go to the VERITAS Cluster Administrator and there our sysadmin had

already created resources for the disk-groups. However, I cannot bring 
anyone online because the GUI keeps on telling me that the resource has 
not been probed on the system (I have two systems, tried to online on A 
and B, but same behavior). I deleted the resource, created a new one, 
same issue. I still have a "?" mark on the resource. Issuing a probe 
does not solve anything. I checked the engine_A.log and can see that the

probe was fired, but nothing more. I can run the hares -probe dba_DG 
-sys A and I get the prompt back, nothing else appears !?

I am puzzled ! Any idea ? Any known issue ?

Many thanks in advance.
Regards,
/Pascal
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] IPMultiNICB, mpathd and network outages

2008-10-20 Thread Jim Senicka

I would be more concerned about future failures being handled properly.
If you were able to take out all networks from all nodes at same time,
you have a SPOF. If this was a one time maintenance upgrade to your
network gear and not a normal event, setting VCS to not respond to
network events means that future cable or port issues will not be
handled.
If it is a common occurrence for all networks to be lost, perhaps you
need to address the network issues :-)



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
DeMontier, Frank
Sent: Monday, October 20, 2008 11:10 AM
To: Paul Robertson; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] IPMultiNICB, mpathd and network outages

FaultPropagation=0 should do it.

Buddy DeMontier
State Street Global Advisors
Infrastructure Technical Services
Boston Ma 02111

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul
Robertson
Sent: Monday, October 20, 2008 10:37 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] IPMultiNICB, mpathd and network outages

We recently experienced a Cisco network issue which prevented all
nodes in that subnet from accessing the default gateway for about a
minute.

The Solaris nodes which run probe-based IPMP reported that all
interfaces had failed because they were unable to ping the default
gateway; however, they came back within seconds once the network issue
was resolved. Fine.

Unfortunately, our VCS nodes initiated an offline of the service group
after the IPMultiNICB resources detected the IPMP fault. Since the
service group offline/online takes several minutes, the outage on
these nodes was more painful. Furthermore, since the peer cluster
nodes in the same subnet were also experiencing the same mpathd fault,
there would have been little advantage to failing over the service
group to another node.

We would like to find a way to configure VCS so that the service group
does not offline (and any dependent resources within the service group
are not offlined) in the event of an mpathd (i.e. IPMultiNICB)
failure. In looking through the documentation, it seems that the
closest we can come is to increase the IPMultiNICB ToleranceLimit from
"1" to a huge value:

 # hatype -modify IPMultiNICB ToleranceLimit 

This should achieve our desired goal, but I can't help thinking that
it's an ugly hack, and that there must be a better way. Any
suggestions are appreciated.

Cheers,

Paul

P.S. A snippet of the main.cf file is listed below:


 group multinicbsg (
   SystemList = { app04 = 1, app05 = 2 }
   Parallel = 1
   )

   MultiNICB multinicb (
   UseMpathd = 1
   MpathdCommand = "/usr/lib/inet/in.mpathd -a"
   Device = { ce0 = 0, ce4 = 2 }
   DefaultRouter = "192.168.9.1"
   )

   Phantom phantomb (
   )

   phantomb requires multinicb

 group app_grp (
   SystemList = { app04 = 0, app05 = 0 }
   )

   IPMultiNICB app_ip (
   BaseResName = multinicb
   Address = "192.168.9.34"
   NetMask = "255.255.255.0"

   Proxy appmnic_proxy (
   TargetResName = multinicb
   )

   (various other resources, including some that depend on app_ip
   excluded for brevity)

   app_ip requires appmnic_proxy
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Server crashes but VCS doesn't detect it

2008-06-17 Thread Jim Senicka

If power cycle fixed it, it was still heartbeating on LLT. 

Sent from my Nokia E62 handheld by goodlink.

 -Original Message-
From:   Andrey Dmitriev [mailto:[EMAIL PROTECTED]
Sent:   Tuesday, June 17, 2008 03:33 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Cc: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] Server crashes but VCS doesn't detect it

Had sort of a weird case today.
We had a server failure, lost network, console was being filled with some sort 
of crash info.
The cluster however, showed everything online. We also had a netdump configured 
(linux), but that couldn't work b/c network was down.
Customer is unhappy why it didn't fail over.
Anyone can think of a reason or think of how I can prevent something similar in 
the future?
I sort of suspect LLT might still have been up somewhat.
It wasn't until we powercycled the box did the other nodes detect it was down.

-andrey

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] vxfencing >2 nodes

2008-06-12 Thread Jim Senicka

We currently do not have I/O fencing implemented on the VCS Windows
package.

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, June 12, 2008 4:37 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] vxfencing >2 nodes

Agreed.

But is it even possible to run fencing on a w*dows system?

/A
---
Andreas Lundgren
Technical Account Manager | Steria AB
Karlsrov 2D, Box 544, SE-182 15 Danderyd, Sweden
Mobile: +46-709-214381
Email: [EMAIL PROTECTED]

[EMAIL PROTECTED] wrote: -

To: "Shashi Kanth Boddula" <[EMAIL PROTECTED]>,
"Mayank Vasa" <[EMAIL PROTECTED]>
From: "Jim Senicka" <[EMAIL PROTECTED]>
Sent by: [EMAIL PROTECTED]
Date: 2008-06-12 17:33
cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] vxfencing >2 nodes

For any cluster larger than 1 node, I/O fencing is highly
recommended to
protect data integrity in the event of a split brain.
2 nodes is not in any way more resistant to split brain than 3
nodes or
more.

VCS does not use any form of quorum based membership (quorum has
a
number of it's own ugly issues), so there is no difference in
how our
membership works when you have 2, 3, or 32 nodes

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Shashi
Kanth Boddula
Sent: Thursday, June 12, 2008 7:30 AM
To: Mayank Vasa
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] vxfencing >2 nodes

Ok, thanks for clarification.

I have seen many clustering products documentation which says
that
fencing/quorum is optional/not_required for more than 2 node
clusters,
and they says that there is very very less chance of happening
split
brain condition for more than 2 node clusters.

-- Shashi

Mayank Vasa wrote:
> Shashi:
>
> The number of nodes is not a decision making factor for
fencing. For a

> cluster greater than 2 nodes, fencing helps to protect your
data in 
> the case of a split brain scenario.
>
> SFRAC requires fencing. It is not supported without it.
>
> Regards,
> + Mayank
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf
Of Shashi

> Kanth Boddula
> Sent: Wednesday, June 11, 2008 12:23 AM
> To: veritas-ha@mailman.eng.auburn.edu
> Subject: [Veritas-ha] vxfencing >2 nodes
>
> Is vxfencing required if we go for >=3 node cluster ?  Or,
vxfencing 
> is optional if we go for >=3 node cluster ?
>
>
> I am going for 4-node VCS5 SFRAC, still vxfencing required for
me ? , 
> does all VCS5 SFRAC modules work properly without vxfencing ?
>
>
>
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
>
>   

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

This email originates from Steria AB, Box 544, SE-182 15 Danderyd,
Sweden, +46 8 622 42 00, http://www.steria.se. This email and any
attachments may contain confidential/intellectual property/copyright
information and is only for the use of the addressee(s). You are
prohibited from copying, forwarding, disclosing, saving or otherwise
using it in any way if you are not the addressee(s) or responsible for
delivery. If you receive this email by mistake, please advise the sender
and cancel it immediately. Steria may monitor the content of emails
within its network to ensure compliance with its policies and
procedures. Any email is susceptible to alteration and its integrity
cannot be assured. Steria shall not be liable if the message is altered,
modified, falsified, or even edited.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] vxfencing >2 nodes

2008-06-12 Thread Jim Senicka

For any cluster larger than 1 node, I/O fencing is highly recommended to
protect data integrity in the event of a split brain.
2 nodes is not in any way more resistant to split brain than 3 nodes or
more.


VCS does not use any form of quorum based membership (quorum has a
number of it's own ugly issues), so there is no difference in how our
membership works when you have 2, 3, or 32 nodes
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Shashi
Kanth Boddula
Sent: Thursday, June 12, 2008 7:30 AM
To: Mayank Vasa
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] vxfencing >2 nodes

Ok, thanks for clarification.

I have seen many clustering products documentation which says that
fencing/quorum is optional/not_required for more than 2 node clusters,
and they says that there is very very less chance of happening split
brain condition for more than 2 node clusters.

-- Shashi

Mayank Vasa wrote:
> Shashi:
>
> The number of nodes is not a decision making factor for fencing. For a

> cluster greater than 2 nodes, fencing helps to protect your data in 
> the case of a split brain scenario.
>
> SFRAC requires fencing. It is not supported without it.
>
> Regards,
> + Mayank
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Shashi

> Kanth Boddula
> Sent: Wednesday, June 11, 2008 12:23 AM
> To: veritas-ha@mailman.eng.auburn.edu
> Subject: [Veritas-ha] vxfencing >2 nodes
>
> Is vxfencing required if we go for >=3 node cluster ?  Or, vxfencing 
> is optional if we go for >=3 node cluster ?
>
>
> I am going for 4-node VCS5 SFRAC, still vxfencing required for me ? , 
> does all VCS5 SFRAC modules work properly without vxfencing ?
>
>
>
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
>
>   

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS with replicated storage

2008-06-05 Thread Jim Senicka

You are attempting to build what is called a Replicated Data Cluster.
This should be documented in the UG as I recall.
You will use identical DG and volume resources, with the appropriate
replication management resource under the DG. To do this and comply with
the EULA, you need the HA/DR Edition of VCS, to license you to use the
replication agents.
 
In an RDC, the replication agent manages read/write enabling and
direction of replication.  When you failover, the opposite node is write
enabled, then the normal DG and volume agents bring up the storage
Hugh Shannon here at Symantec is the Technical Product Manager
responsible for these type configs



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Esson,
Paul
Sent: Thursday, June 05, 2008 9:40 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS with replicated storage



Folks,

 

My background with VCS is limited to local clusters with shared storage
arrays ad software mirroring of volumes using VxVM.

I have been asked to implement a VCS 5.0 cluster on Solaris 10 using
replicated block-level NetApps storage.  This will be a stretched
cluster with one node on each of two sites and heartbeat connections
using VLANs.  

 

What I am struggling with at the moment is how to configure the storage
resources within VCS.  I am use to defining shared volume groups/volumes
but as I see it each node will effectively have a local LUN or LUNs with
blocks being replicated at the array level from the active to the
inactive node.

 

Do I create separate Volume Groups and Volumes on each node and set the
associated attributes on a per system basis such that failover starts
the application up mounting the file system on the replica volume of the
alternative node?

 

Regards

Paul Esson 
Redstor Limited 

Direct:   +44 (0) 1224 595381 
Mobile:  +44 (0) 7766 906514 
E-Mail:  [EMAIL PROTECTED] 
Web:www.redstor.com 

REDSTOR LIMITED 
Torridon House 
73-75 Regent Quay 
Aberdeen 
UK 
AB11 5AR 

Disclaimer: 
The information included in this e-mail is of a confidential nature and
is intended only for the addressee.  If you are not the intended
addressee, any disclosure, copying or distribution by you is prohibited
and may be unlawful.  Disclosure to any party other than the addressee,
whether inadvertent or otherwise is not intended to waive privilege or
confidentiality.

 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Veritas Cluster Server

2008-06-05 Thread Jim Senicka

Have you opened a support case?
 
To the best of my knowledge, VCS 4.1 does not support RHEL 5. 
Support can confirm



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Goutham
N
Sent: Thursday, June 05, 2008 8:37 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Veritas Cluster Server


 
Hi,
 
I am installing Veritas Cluster server 4.1 on Red Hat Linux 5.x
environment. I am getting the following error message.
Can anyone help with a solution.
 
 

Cluster Server configured successfully.

Starting Cluster Server:

Starting LLT on usplselux141

/etc/init.d/llt start 2>&1

exit=256

Starting LLT: 

LLT: loading module...

LLT:Error: cannot find compatible module binary

/sbin/lltconfig 2>&1

exit=256

LLT lltconfig ERROR V-14-2-15000 open /dev/llt failed: No such file or
directory

Error

CPI ERROR V-9-120-1171 Could not start LLT on usplselux141: LLT
lltconfig ERROR V-14-2-15000 open /dev/llt failed: No

such file or directory

The installvcs log is saved at:

/opt/VRTS/install/logs/installvcs605084118.log



-- 
N. Gowthaman 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] .stale file

2008-06-03 Thread Jim Senicka

you only need one notifier, usually in the CSG.
No need for proxy anywhere else.

From: i man [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2008 12:00 PM
To: Gene Henriksen
Cc: John Cronin; Jim Senicka; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] .stale file

Gene,John,Jim,

Thats excellent. So many thanks again for the new ideas. There is one
last query regarding the whole activity. 

This is regarding the use of Proxy for notifier. Nobody has been able to
tell me definately whether this is required for the notifier. If I
create my notifier in the Cluster service group or any other service
group does it require a proxy to send alerts. If so and if I create the
notifier in separate service group is it fine if I create the proxy in
Cluster service group. 

Having gone thorugh BARG there are sample examples which explain
notifier dependency on proxy, but even without the proxy things seem to
be working fine for me in a test system.Also when installing thorugh GUI
it does ask about some NIC card information, the step which I always
skipped, don't know how relevant this is for the creation and working of
notifier.

Ciao

On Tue, Jun 3, 2008 at 4:22 PM, Gene Henriksen
<[EMAIL PROTECTED]> wrote:

Putting the Notifier in the cluster service group also has an
advantage because CSG is the first SG up and the hardest to kill,
therefore in times of lots of problems you will get notification more so
than if the service group you arbitrarily chose to use is faulted on all
systems in the cluster, then notification is also down.

You could create the CSG in one system, save the configuration,
run "hacf -cftocmd ." in the /etc/VRTSvcs/conf/config directory, then
edit the main.cmd (look toward the bottom) to find the commands to
create the CSG and Notifier, make a script and modify to run on other
clusters.

From: John Cronin [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2008 10:45 AM
To: i man
    Cc: Jim Senicka; Gene Henriksen;
veritas-ha@mailman.eng.auburn.edu 

Subject: Re: [Veritas-ha] .stale file

It would be no problem to create a Notifier resource in any
arbitrary service group with the CLI.  If I understand this correctly,
what you are doing is shutting down VCS, and then editing main.cf to
change the config?  If this was for one or two clusters, it might be an
OK way to do it, but if this is for hundreds of systems, it would be
better to learn how to use the CLI and then script the changes.

Also, what is the problem with putting the notifier in the
ClusterService group?  I can't see how putting it in another service
group would provide you any particular benefit - the Notifier if going
to do the same things no matter which service group it is in.  Since it
is a cluster wide service, it makes sense that it should be in the
ClusterService group.

As for using "hastop -all -force", I tend to use it frequently
on production systems when I am doing something that requires stopping
the cluster, but does not require stopping the systems or the services
running on those systems (e.g. patching or upgrading VCS, or
reconfiguring GAB or LLT).  However, I would not do this to accomplish
something that can be done with CLI commands.

-- 

John Cronin

On 6/3/08, i man <[EMAIL PROTECTED]> wrote: 

Correct Jim, If this would have been a normal cluster service
group I would loved to have done that. What I'm trying to obtain is
creation of snmp notifier in a separate service group . Through GUI you
cannot create it in your own service group but could only create it as a
part of Clusterservicegroup. Not sure if this is achievable through CLI.

Any suggestions ? 

On Tue, Jun 3, 2008 at 2:52 PM, Jim Senicka
<[EMAIL PROTECTED]> wrote:

Right.

But that can also be done via CLI or GUI with the cluster
running.

From: i man [mailto:[EMAIL PROTECTED] 
    Sent: Tuesday, June 03, 2008 9:48 AM
To: Jim Senicka
Cc: Gene Henriksen; veritas-ha@mailman.eng.auburn.edu 

Subject: Re: [Veritas-ha] .stale file

Jim,

This is to update systems with some new service groups. This is
not on a single system but rather large number of systems (100+)

Also so many thanks to Gene and John for resolving my doubts.

        Ciao,

On Tue, Jun 3, 2008 at 2:30 PM, Jim Senicka
<[EMAIL PROTECTED]> wrote:

Bigger question is what are you routinely using stop -force to
accomplish?

__

Re: [Veritas-ha] .stale file

2008-06-03 Thread Jim Senicka

Bigger question is what are you routinely using stop -force to
accomplish?
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gene
Henriksen
Sent: Tuesday, June 03, 2008 8:17 AM
To: i man; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] .stale file



It indicates you did not close and save the cluster configuration after
making modifications. It is a warning. If you close and save the config,
it goes away.

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of i man
Sent: Tuesday, June 03, 2008 7:28 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] .stale file

 

All,

Had some queries regarding the .stale file present in the
/etc/VRTSvcs/conf/config directory. I know that if the haagents are
restarted with hastop -all -force and this file is present the cluster
memebers could be in stale admin wait state. I have been deleting this
file then hastop -all -force and then hastart on the the nodes. I do not
want the service groups to go offline that's why -force.

My query is what is the use of .stale ?
Would hastart -force help to get nodes back if this file is present ?
Is file deletion the only method to get the nodes back ?

I noticed recently that when getting the cluster back, this way my
clusters the information about the admin password. I thnk I'm doing
something wrong.any help.

Ciao.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] .stale file

2008-06-03 Thread Jim Senicka

Right.
But that can also be done via CLI or GUI with the cluster running.

From: i man [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2008 9:48 AM
To: Jim Senicka
Cc: Gene Henriksen; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] .stale file

Jim,

This is to update systems with some new service groups. This is not on a
single system but rather large number of systems (100+)

Also so many thanks to Gene and John for resolving my doubts.

Ciao,

On Tue, Jun 3, 2008 at 2:30 PM, Jim Senicka <[EMAIL PROTECTED]>
wrote:

Bigger question is what are you routinely using stop -force to
accomplish?

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gene
Henriksen
Sent: Tuesday, June 03, 2008 8:17 AM
To: i man; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] .stale file

It indicates you did not close and save the cluster
configuration after making modifications. It is a warning. If you close
and save the config, it goes away.

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of i man
Sent: Tuesday, June 03, 2008 7:28 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] .stale file

All,

Had some queries regarding the .stale file present in the
/etc/VRTSvcs/conf/config directory. I know that if the haagents are
restarted with hastop -all -force and this file is present the cluster
memebers could be in stale admin wait state. I have been deleting this
file then hastop -all -force and then hastart on the the nodes. I do not
want the service groups to go offline that's why -force.

My query is what is the use of .stale ?
Would hastart -force help to get nodes back if this file is
present ?
Is file deletion the only method to get the nodes back ?

I noticed recently that when getting the cluster back, this way
my clusters the information about the admin password. I thnk I'm doing
something wrong.any help.

Ciao.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Importance of NIC Proxy in Clusterservice group

2008-06-02 Thread Jim Senicka

take a look at the BARG.
The NIC agent monitors the NIC

From: i man [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 02, 2008 1:27 PM
To: Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Importance of NIC Proxy in Clusterservice
group

I basically have three network interfaces defined on the Network service
group.

Can you please explain what sort of monitoring is performed and by whom
?

On Mon, Jun 2, 2008 at 6:23 PM, Jim Senicka <[EMAIL PROTECTED]>
wrote:

You should be monitoring the NIC in some service group on the
box. A NIC Proxy is used to prevent duplicate monitoring by other
service groups

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of i man
Sent: Monday, June 02, 2008 12:13 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Importance of NIC Proxy in Clusterservice
group

All,

Can anybody let me know why is a NIC proxy required in
clusterservice group ?

Also is this necessary to create a NIC proxy in clusterservice
group for SNMP notifier which is created in separate service group.

Cioa.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Importance of NIC Proxy in Clusterservice group

2008-06-02 Thread Jim Senicka

You should be monitoring the NIC in some service group on the box. A NIC
Proxy is used to prevent duplicate monitoring by other service groups

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of i man
Sent: Monday, June 02, 2008 12:13 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Importance of NIC Proxy in Clusterservice group

All,

Can anybody let me know why is a NIC proxy required in clusterservice
group ?

Also is this necessary to create a NIC proxy in clusterservice group for
SNMP notifier which is created in separate service group.

Cioa.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] question about hastop

2008-05-23 Thread Jim Senicka

Hastop -force -all does not take down resources.
But why not add the resources online?
Hastop -force -all is really only used for heavy lifting, like upgrading
VCS bits. You can add the resources on the fly using CLI or GUI 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paveza,
Gary
Sent: Friday, May 23, 2008 10:13 AM
To: 'Veritas HA'
Subject: [Veritas-ha] question about hastop

I currently have a Veritas Cluster for RAC which really only is
responsible for mounting the filesystems for the cluster.  The database
start / stop and CSSD are handled via system startup scripts.  I need to
modify the main.cf file to add a resource for Networker.  

If I issue the hastop -all -force command (as outlined in the Networker
manual), will this shutdown the cluster and make the filesystems umount?
Or will everything remain up and running?

-
Gary Paveza, Jr.
AIG - Personal Lines Division
Technical Specialist - Architecture - HP CSE, SCSA
(302) 252-4831 - phone

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Veritas Cluster Server 5.0 available for RHEL 5.x

2008-05-22 Thread Jim Senicka

5.0MP3 will add RHEL 5 support. Talk with your rep on release dates?
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tom
Stephens
Sent: Thursday, May 22, 2008 11:48 AM
To: Goutham N; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Veritas Cluster Server 5.0 available for RHEL
5.x



Not according to the release notes for the product.  These can be found
at: 

 

ftp://exftpp.symantec.com/pub/support/products/ClusterServer_UNIX/283850
.pdf (For Linux 5.0)

ftp://exftpp.symantec.com/pub/support/products/ClusterServer_UNIX/287175
.pdf (For Linux 5.0 MP1)

ftp://exftpp.symantec.com/pub/support/products/ClusterServer_UNIX/289442
.pdf (For Linux 5.0 MP2)

.

 

Tom

 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Goutham
N
Sent: Thursday, May 22, 2008 1:40 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Veritas Cluster Server 5.0 available for RHEL 5.x

 

Is Veritas Cluster Server 5.0 available for RHEL Version 5 ?



-- 
N. Gowthaman 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] excessive delay between successive calls to GAB

2008-04-21 Thread Jim Senicka

There are really only 2 possibilities.
1. HAD was blocking on something (HAD is a replicated state machine, so
it would block waiting on an update to get confirmed on all nodes). This
could be network related. 

2. HAD is swapped out. HAD is a high priority process, but still user
land. This means a lack of memory avail to user land processes could
cause same symptoms.

Based on what you have said here, I lean toward number 2. Keep an eye on
the cluster after you have lowered memory available to the kernel?

-Original Message-
From: Stoyan Angelov [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 21, 2008 11:20 AM
To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] excessive delay between successive calls to
GAB

Jim Senicka wrote:
>  "Excessive delay between successive calls to GAB heartbeat"
> is HAD heart beating GAB, so the network should not be in the picture 
> (LLT heartbeat side), unless HAD was blocked trying to get a cluster 
> state update across all nodes and the network was acting weird.
> 
> 
> You hit it right looking at system load (need to look on all systems),

> plus look at network health on all (HAD could be spinning on a 
> communication that GAB is struggling to get out), and look to make 
> sure you have not run out of /tmp space anywhere, which might prevent 
> locking

hi Jim and all,

thank you for your fast answer. i checked the system load on both
systems again - cpu utilization and load are normal, however memory
usage was pretty high on both systems. file system usage was ok (both
/tmp and all other file systems have plenty of space) and there were no
recent network problems logged.

the memory usage i mentioned: both systems had very little free memory
available (the first node ~100Mb memory free and the second node ~80Mb).

i lowered the dbc_max_pct kernel parameter in order to take some memory
occupied by the os buffer cache. now there is a lot more memory
available...

can this memory shortage be related to the problems experienced by HAD ?

both machines have plenty of swap space available - and very little of
it is used. both nodes have uptime of ~ 500 days and never displayed
anything related to memory shortage in the system logs.

greetings,

Stoyan

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] excessive delay between successive calls to GAB

2008-04-20 Thread Jim Senicka

 "Excessive delay between successive calls to GAB heartbeat"
is HAD heart beating GAB, so the network should not be in the picture
(LLT heartbeat side), unless HAD was blocked trying to get a cluster
state update across all nodes and the network was acting weird.


You hit it right looking at system load (need to look on all systems),
plus look at network health on all (HAD could be spinning on a
communication that GAB is struggling to get out), and look to make sure
you have not run out of /tmp space anywhere, which might prevent locking



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stoyan
Angelov
Sent: Sunday, April 20, 2008 2:36 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] excessive delay between successive calls to
GAB

Stoyan Angelov wrote:
> hello all,
> 
> i am running vcs 4.1 on two hp pa-risc nodes (rp4440) - the cluster 
> has been happily murmuring for more than 10 months now. several days 
> ago the following error appeared in the engine log:
> 
> 2007/08/21 18:16:04 VCS WARNING V-16-1-10023 Agent DiskGroup not 
> sending alive messages since Tue Aug 21 18:15:49 2007
> 
> the above was repeated for different Agents configured, then:
> 
> 2007/08/21 18:16:04 VCS WARNING V-16-1-10485 Excessive delay between 
> successive calls to GAB heartbeat (204 seconds)
> 
> 2007/08/21 18:16:16 VCS INFO V-16-1-10306 Resource 
> (Owner: unknown, Group: ) is offline on  
> (Previous State = OFFLINE)
> 
> several lines as the above, for all Groups, and different node, all 
> with the "Previous State = OFFLINE"
> 
> GAB uses the default timeout (15000 milliseconds). i checked both 
> nodes and there was no high cpu usage and system resources were not 
> utilized unusually during the time the error occurred.
> checking the ethernet interfaces that are configured for the GAB/LLT 
> interconnect i can not see any errors, links being down or hardware 
> problems (two separate NIC's are used on each machine connected with a

> cross-over ethernet cable).
> 
> i should say that the services were running fine and no maintenance 
> was needed.
> 
> i am wondering what could be causing the above timeouts and should i 
> troubleshoot the problem further (and how) ?
> 
> greetings,
> 
> Stoyan Angelov


hi all,

i never understood what caused the problem described above - exactly the
same thing happened few hours ago on the same cluster. the same
conditions apply (no high load on the systems and no errors in the
system logs, root disks seem ok and no scsi error appear). again the
services were not affected.

one thing that i see while using lltstat is that there are lots of "Snd
no links up" errors (42542 on the first node and 62125 on the second ),
there are no other errors on both nodes.

any help will be greatly appreciated!


greetings,

Stoyan






___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] bundled HP-UX vxfm/vxfs

2008-04-18 Thread Jim Senicka

With SFRAC already installed, shouldn't you have VCS already installed?  

Sent from my Nokia E62 handheld by goodlink.

 -Original Message-
From:   Shashi Kanth Boddula [mailto:[EMAIL PROTECTED]
Sent:   Friday, April 18, 2008 02:12 AM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] bundled HP-UX vxfm/vxfs

I used to get the bellow message whenever i install VCS

"SFRAC version 4.1 includes VRTSvxvm version 4.1.010.  A more recent
version of VRTSvxvm, 4.1.011, is already installed.

CPI WARNING V-9-10-1400 In this situation VRTSvxvm version 4.1.011 will
not be installed or downgraded.
SFRAC version 4.1 may not operate correctly with this more recent package.
The VRTSvxvm package must be removed manually before version 4.1.010 can
be installed."

"SFRAC version 4.1 includes VRTSvxfs version 4.1.  A more recent version
of VRTSvxfs, 4.1.001, is already installed.
CPI WARNING V-9-10-1400 In this situation VRTSvxfs version 4.1.001 will
not be installed or downgraded.
SFRAC version 4.1 may not operate correctly with this more recent package.
The VRTSvxfs package must be removed manually before version 4.1 can be
installed."

Is there any known issues/problems if we proceed to install VCS without
removing operating system bundled VxVM/VxFS, and continue to install VCS
with operating system bundled VxVM/VxFS (not VCS bundled VxVM/VxFS) ?  
Or, we can simply ignore this message, and cluster will operate normally
with out any problems ?

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Coordinator disks

2008-04-11 Thread Jim Senicka

Fence won't start if even. 

Sent from my Nokia E62 handheld by goodlink.

 -Original Message-
From:   Joshua Fielden [mailto:[EMAIL PROTECTED]
Sent:   Friday, April 11, 2008 04:30 PM US Mountain Standard Time
To: Rongsheng Fang; veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] Coordinator disks

3 *or more*, but they need to be an odd number, so minimize the amount of time 
they're even -- coordinator races are decided by holding a majority, so you 
have exposure while the total number of disks are even.

Cheers,

jf

Sent by GoodLink (www.good.com)

 -Original Message-
From:   Rongsheng Fang [mailto:[EMAIL PROTECTED]
Sent:   Friday, April 11, 2008 04:09 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] Coordinator disks

Hi,

Does anybody know how many coordinator disks the coordinator disk  
group can have? The VCS installation guide says 3, but doesn't say if  
more is supported.

We currently have 3 coordinator disks configured for a VCS 5.0 MP1  
cluster with IO fencing enabled. We will need to shutdown the array  
where the coordinator disks reside for temporarily (for a few days).  
So I am thinking if I can add another three coordinator disks from  
another array to the coordinator disk group. This way the coordinator  
disk group would still have 3 available coordinator disks while the  
original three are down. Would this work? Or what's the best way to  
deal with this situation?

Thanks,

Rongsheng
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Coordinator disks

2008-04-11 Thread Jim Senicka

No. 
A. It must be an odd number. (otherwise no majority possible)
B. You cannot add online. 

You will need to bounce the cluster (or at least the fence driver) to move to 
the new array


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Rongsheng Fang [mailto:[EMAIL PROTECTED]
Sent:   Friday, April 11, 2008 04:09 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] Coordinator disks

Hi,

Does anybody know how many coordinator disks the coordinator disk  
group can have? The VCS installation guide says 3, but doesn't say if  
more is supported.

We currently have 3 coordinator disks configured for a VCS 5.0 MP1  
cluster with IO fencing enabled. We will need to shutdown the array  
where the coordinator disks reside for temporarily (for a few days).  
So I am thinking if I can add another three coordinator disks from  
another array to the coordinator disk group. This way the coordinator  
disk group would still have 3 available coordinator disks while the  
original three are down. Would this work? Or what's the best way to  
deal with this situation?

Thanks,

Rongsheng
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS -HA memory replacement on one node

2008-03-29 Thread Jim Senicka

Even the freeze is not 100% necessary. If you switch the groups first, and then 
shutdown, nothing happens to the running groups. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Sigmund Brandstaetter [mailto:[EMAIL PROTECTED]
Sent:   Saturday, March 29, 2008 12:18 AM US Mountain Standard Time
To: [EMAIL PROTECTED]; veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] VCS -HA memory replacement on one node

Sounds ok, but i don't think that hastop all force is really necessary,
freezing the groups on the other node should be just fine.

Cheers
Sigmund


On 3/28/08 10:14 PM, "upen" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> We want to replace memory module on one of the 2 nodes on HA cluster running
> Solaris OS.
> 
> I have decided to do following
> 
> 1. switch the service groups to other node
> 2. freeze the service groups on above node
> 3. then hastop -force -all, verify using hastatus -sum
> 4. init 5 on other solaris node where Memory module will be replaced.
> 5. once system is booted then hastart on both nodes
> 6. unfreeze the SGs running on other node.
> 
> 
> Is this procedure correct, please advise
> 
> Thanks


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Solaris VCS with share stroage in VxFS format how?

2008-03-10 Thread Jim Senicka

I think you are confusing creating 3 disks for VCS I/O Fencing coordinator 
disks with creating file systems?
VCS has no need for any shared file systems on its own. Any requirement 
mentioned in the docs to shared volumes would be for coordinator disks. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Jiann Chen [mailto:[EMAIL PROTECTED]
Sent:   Monday, March 10, 2008 09:15 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] Solaris VCS with share stroage in VxFS format how?

I have two Solaris 10 SPARC servers installed with VERITAS Storage Found. 4.1 
and VERITAS Cluster Server software.
For VCS, I am able to finish the initial setup and they are working properly.
Currently, both nodes are connected to a SUN FC share storage with 4 partitions 
created.

For VCS setup, I am told to create two or three VxFS volumes on the shared 
storage for clustering purpose.
Can any one provide me the steps please? 

*I was told I don't need VERITAS Cluster File system since VCF is for both 
nodes to mount shared volume the same time.
All I need is to have one node hosts this VxFS volume, one node is in standby 
mode. So VCS should do the work.

Thanks in advance!

JMC 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Question re SFRAC 5.0

2008-02-29 Thread Jim Senicka

Kelly,
That is not normal. If the DB is top of the tree, and set to non
critical, it should not cause the group to offline. Even after we
introduced FaultPropagation and ManageFaults the core
Critical/Non-Critical behavior should not have changed.
Can you open a case on this?
 

Jim


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Friday, February 29, 2008 11:57 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Question re SFRAC 5.0


We are testing a new SFRAC 5.0 cluster.  One of the scenarios is a
"shutdown abort" to one instance.  When we did this, it took the whole
group offline on that node even though the database is the top resource
in the dependency tree.  Is this normal behavior?  I don't remember this
every happening before.  I remember it only taking the database offline
and leaving the mounts up.  The database is a non-critical resource with
nothing depending on it. 

Thanks in advance for your help!


**
The information contained in this message, including attachments, may
contain privileged or confidential information that is intended to be
delivered only to the person identified above. If you are not the
intended recipient, or the person responsible for delivering this
message to the intended recipient, Alltel requests that you immediately
notify the sender and asks that you do not read the message or its
attachments, and that you delete them without copying or sending them to
anyone else. 



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS geographic edition

2008-02-28 Thread Jim Senicka

The VCS HA/DR edition provides the Global Cluster Option and replication
management agents. Speak with your Symantec sales rep to obtain the
bits?
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of osk
Sent: Thursday, February 28, 2008 7:29 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS geographic edition


Hi,
   Is it possible to do the High Availability over Geographic region
using VCS and which software we need to use. can you guys provide the
link to download the VCS software over geographic region.
 
Thanks in advance
 
Regards
Karthikeyan.N

-- 
winners don't do different things
they do things differently 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLT crossed links

2008-02-19 Thread Jim Senicka

I disagree, as long as the SAP stuff is taken care of.
2 dedicated + 2 additional (even sharing a VLAN) is  pretty good.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joshua
Fielden
Sent: Tuesday, February 19, 2008 11:40 AM
To: Ceri Davies
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] LLT crossed links

One can't set up a successful cluster planning for the best case -- one
has to plan for the worst case. 2, 4, or 40 links, the underlying
discipline doesn't change.

What happens, in the below scenario, when you lose both dedicated
heartbeats? You're left with two links on the same VLAN, which is
verboten.

Cheers,

jf


Sent by GoodLink (www.good.com)


 -Original Message-
From:   Ceri Davies [mailto:[EMAIL PROTECTED]
Sent:   Tuesday, February 19, 2008 09:35 AM US Mountain Standard Time
To: Joshua Fielden
Cc: veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] LLT crossed links

Even if I have four links?  The situation is that I have:

  e1000g0 - public interface, VLAN 2, say
  e1000g1 - heartbeart interface, VLAN 3
  nxge0   - public interfae, VLAN 2
  nxge1   - heartbeat interface, VLAN 4

I don't see how having e1000g0 and nxge0 both on VLAN2 can cause the
problems you mention given the presence of the other high priority
links.  Are you certain that's the case?

Thanks,

Ceri

On Tue, Feb 19, 2008 at 09:28:55AM -0700, Joshua Fielden wrote:
> Having multiple LLT links on the same VLAN/network can cause a variety
of problems such as split-brain scenarios, inability to rejoin the
cluster, and cluster failures.
> 
> The heartbeats really need to be isolated from each other.
> 
> Cheers,
> 
> jf
> 
> 
> Sent by GoodLink (www.good.com)
> 
> 
>  -Original Message-
> From: Ceri Davies [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, February 19, 2008 09:25 AM US Mountain Standard Time
> To:   veritas-ha@mailman.eng.auburn.edu
> Subject:  [Veritas-ha] LLT crossed links
> 
> 
> I have a couple of clusters running Solaris 10, VCS 5.
> 
> I'm running IPMP on my public links and I want to configure each 
> public interface as a low-priority link.
> 
> Since they're connected to the same VLAN, when I start LLT I get the 
> following warning:
> 
>  llt: LLT WARNING V-14-1-10497 crossed links? link 0 and link 3 of
>  node 1 on the same network
> 
> I'm fully aware of what this means, but I'm not 100% sure if this is 
> likely to cause me a problem or whether it's just a warning in case I 
> thought I'd connected them to different VLANs.
> 
> Is this likely to be OK?  I have two other links per node which have a

> dedicated VLAN each.
> 
> Ceri
> --
> That must be wonderful!  I don't understand it at all.
>   -- Moliere

--
That must be wonderful!  I don't understand it at all.
  -- Moliere

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] how to take a resource offline, urgent

2008-01-11 Thread Jim Senicka

Offline the service group. That will take storage offline as well

Sent from my Nokia E62 handheld by goodlink.

 -Original Message-
From:   upen [mailto:[EMAIL PROTECTED]
Sent:   Friday, January 11, 2008 06:31 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] how to take a resource offline, urgent

Hi there is going be san flare upgrade in our environment and
ha-cluster servers are connected to SAN. the technician suggests to
take the oracle offline using vcs . I know a resource Type: Oracle
Name: BB60 is ONLINE in list of  Resources on online server.

How to take this resource offline step wise.. ??

Can I just take it offline using webconsole (not java but veritas web
console) in browser

I can see these options
online, offline, clear , probe, offline propogate. for the ORACLE TYPE resource.

Can I just click offline and it will safely be shutdown or I have to
take Application service group offline to take this resource offline.

What would be the safest way for oracle to be taken offline so that
there is no corruption...

Thank you,

-- 
upen,
emerge -uD life (Upgrade Life with dependencies)
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Is VxVM mirror supported in VCS GCO option?

2007-12-28 Thread Jim Senicka

We made a decision to not support VxVM mirror in a GCO environment
because it breaks our ability to use SCSI-III based fencing. While you
could make the mirror work, it is not a Symantec supported
configuration. For dual cluster configs we would require some form of
replication.
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gene
Henriksen
Sent: Friday, December 28, 2007 5:49 AM
To: Pavel A Tsvetkov; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Is VxVM mirror supported in VCS GCO option?



To mirror volumes you must be dealing with a relatively small distance,
such as less than 80K. For these distances, why not use a single cluster
called a "stretch" or "campus" cluster? In SF 5.0 there is the concept
of "site awareness" so that VM is aware of the two sites and if a volume
at the remote site becomes detached, then all volumes at the remote site
are detached thereby maintaining consistency of the site.

 

I have not heard of the limitation you mention, I do know that in a
Replicated Data Cluster (VVR within a cluster), synchronous replication
is required because unlike GCO there is nothing to prevent failover and
we don't want the cluster to experience failovers and take over with old
data automatically.

 

With mirroring, it certainly would be possible. As in the case of
replicating data we do not recommend automatic failover. Automatic
failover could result in split brain destroying the data if the link
between the two clusters were interrupted making it appear the primary
cluster was down.

 

A lot of configurations are possible, a lot will work, but they may not
be supported. I am not sure who told you this, but I would ask for an
explanation. One possible problem could be the loss of SAN between sites
for hours followed by a failover to the remote site with old data with
the VCS admin being unaware of the storage problem.

 

I think the primary concern is split brain. With replication, you are
working with two distinct data sets. If both sides become active due to
a loss of connectivity, the data is not being corrupted, the two sites
are just growing further apart. 

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pavel A
Tsvetkov
Sent: Friday, December 28, 2007 5:15 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Is VxVM mirror supported in VCS GCO option?

 


Hello all! 

Just  one interesting question about VCS GCO. I was told that VxVM
mirror is not supported if using with Global Cluster Option. Only
replicated volumes can be used ... 
Is it true?  It seems strange to me... Why not?  I think it is quite
possible to failover mirrored VxVM volume between  clusters... Or not???



Kind regards 

Pavel Tsvetkov

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

[Veritas-ha] New VCS course!

2007-12-13 Thread Jim Senicka

Howdy all.
Education informed me that we have a new class online around multiple
clusters
 
"Our new course that includes GCO, Secure Clusters, CMC, Solaris Zones,
RemoteSG agent and the "campus cluster" capability in VM that allows
site tagging is now available. The schedule for it is as follows (all we
need is students):

 Oak Brook IL Jan 30 thru 1 Feb

Mountain View Feb 4-7

Herndon, VA Feb 20-22"

 


________



Jim Senicka
Senior Director, Technical Product Management
Server and Storage Management Group
Symantec Corporation
www.symantec.com <http://www.symantec.com/> 
-
Office: 757-766-0200
Mobile: 757-870-3484
Email: [EMAIL PROTECTED]
-
 


 
<>___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Inbound and outbound traffic

2007-12-13 Thread Jim Senicka

Not really a VCS issue.
It really depends on the IP stack from the OS, or modifying the
application to bind to a specific IP.
Usually the source IP of an outbound packet will be whatever the base
address (first address configured) is on that interface.
One possible solution is to set the base address to be on a different
subnet, that way only your VIP is on the subnet in use, and will be the
first configured interface
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pablo
Calvo
Sent: Wednesday, December 12, 2007 10:37 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Inbound and outbound traffic



How can I set inbound and outbound traffic to use the same interface
(physical and virtual address)?

 

Uniqs S.A.

Sturiza 503 - Olivos 

Buenos Aires - Argentina

TE: (5411) 4711-7755/4799-5516

Cel: (54911) 53747697

 


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.17.1/1182 - Release Date:
12/12/2007 11:29 AM


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] best way for patching of cluster servers

2007-12-07 Thread Jim Senicka

We will get that resolved (Eric and I). 

Jim Senicka




Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent:   Friday, December 07, 2007 01:11 PM US Mountain Standard Time
To: Eric Hennessey
Cc: veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] best way for patching of cluster servers

Hi Eric,

That's funny, I've been told by Veritas Support that Veritas does not
support nodes in the same Cluster running at different Solaris patch
levels, no less different versions of Solaris.

Jon








   
 "Eric Hennessey"  
 <[EMAIL PROTECTED] 
 ymantec.com>   To 
 Sent by:  <[EMAIL PROTECTED]>  
 veritas-ha-bounce  cc 
 [EMAIL PROTECTED] veritas-ha@mailman.eng.auburn.edu   
 urn.edu   Subject 
   Re: [Veritas-ha] best way for   
   patching of cluster servers 
 12/07/2007 06:12  
 AM
   
   
   
   






Hi Upen,

My guess is you spoke with Sun sales when you asked this question.  Try
rephrasing your question to your Sun contact.  Ask him/her if they will
support a collection of systems running Solaris 9 running at different
patch levels, without regard to them being clustered.  That you're running
VCS on these systems isn't Sun's support problem, it's ours, and we
unequivocally support mixing not only different patch levels but different
Solaris versions in the same cluster.  We do this so you can leverage the
cluster as an operational support tool to enable rolling upgrades of the OS
with a minimum of application down time.

The response you got sounds like it came from someone interested in selling
Sun Cluster.  Just because THEY won't support different patch levels and
Solaris versions in the same cluster doesn't mean WE won't.  :-)

Cheers!
Eric

From: upen [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 06, 2007 7:43 PM
To: Eric Hennessey
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] best way for patching of cluster servers

Thanks Eric

One question,

Does Veritas/symantec provide support for Patching Sun servers involved in
Veritas -ha cluster. I contacted Sun for support (we have valid gold
contract) but they still refused to support as machines are part of veritas
cluster.

I am not having  Veritas contract number but I know that our contract was
renewed and I have valid contract. Is there anyway I can find out from
symantec/veritas what is my contract number if I am able to give them
necessary information-machine serial and company info. I don't know
whosoever renewed contract at work place does not seem to be of much help
in term of contract number information ...

I am not able to see the site properly on my linux machine and may be I am
not looking at proper place..if anyone can give me some contact of
veritas/symantec where I can find this information about my contract
details and support for patching sun after this..

Thanks
On Dec 6, 2007 10:25 AM, Eric Hennessey <[EMAIL PROTECTED]>
wrote:


The typical approach to applying OS patches in a clustered environment is
to patch an idle server, let it reboot and rejoin the cluster, and make
sure it's running OK.  If it is, use the cluster software to switch
application(s) from an active server to the one you just patched, and if
the app comes up successfully there, apply the patch to the server that's
now idle.  Keep doing this until all nodes in the cluster have been
patched.





Eric






From: [EMAIL PROTECTED] [mailto:
[EMAIL PROTECTED] On Behalf Of upen
Sent: Thursday, December 06, 2007 4:03 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] best way for patching of cluster servers





Hi

when it is patching of Sun stand alone servers I can patch them and I know
after reboot everything will be fine.

I wanted to how to patch Veritas-ha clustered Sun OS 5.9 Machines. Right
now the cluster service and application services are running on Server 2. I
am not sure after patching if something messes the cluster or applications
running. Plea

Re: [Veritas-ha] connectivity delays

2007-11-28 Thread Jim Senicka

If you have multiple NIC's on a single Sun box plugged into a Cisco
switch environment using multiple VLAN, use of Local MAC = False may be
causing spanning tree reconverge. The switch fabric is potentially
discovering duplicate MAC addresses on different VLAN segments and
assumes a loop somewhere so does a reconverge

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tihomir
Cavuzic
Sent: Wednesday, November 28, 2007 7:04 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] connectivity delays

Hello

Thanks for the advices. I usually do ssh to one of the vIPs controlled
by the VCS, eg. 10.6.132.14. All the applications (for instance
diameter) also use vIPs.

Actually I didn't try to test whether there is a difference in
connectivity behaviour if I connect to physical IP. Will try to see if
there is a difference.

If it is as Jim says (that VCS just uses ifconfig), then I don't think
the problem has to do with VCS, it has to be something lower (Solaris or
switches). Simply because the service groups are very stable and
failovers do not occur.

Local MAC adress is set to FALSE. You said this could be a problem? I'll
try to investigate in that direction...

Thank you both!! Tihomir

[EMAIL PROTECTED] /]# eeprom
test-args: data not available.
diag-passes=1
local-mac-address?=false 


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] connectivity delays

2007-11-28 Thread Jim Senicka

Sorry,
Should have been more clear.

VCS is capable of bringing IP addresses up and down. It uses standard OS
commands to do so (ifconfig in this case). It does not have anything in
the data path at all. 
The only time VCS could possibly be involved is if a failover of a VCS
managed IP address were occurring while you were doing a telnet.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim
Senicka
Sent: Wednesday, November 28, 2007 6:37 AM
To: Tihomir Cavuzic; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] connectivity delays

What address to you telnet to?
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tihomir
Cavuzic
Sent: Wednesday, November 28, 2007 4:43 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] connectivity delays

Hello,

Let me introduce my little connectivity question, maybe VCS-related:

VCS 4.1, 2 Netras 440 with Solaris 10, 5 service groups, one of them is
network. Config files attached. The problem is that often we experience
connectivity delays which are demonstrated for instance by telnet
hold-ups, temporary outages of diameter links and similar, all in
duration of couple of seconds. As soon as it is over, everything goes
back to normal, telnet buffer is emptied, diameter links are up again
automatically etc.

Is there any chance this could have something to do with VCS, or I
should be looking only to Solaris, switch (port) configuration and
ethernet interfaces on my Solaris boxes? I ask this since many boxes are
connected to the same switch, switch ports are uniformly configured, and
still only my machines have trouble with delays. The only differece is
that only my machines have VCS and Solaris 10 -- all the others have
Solaris 8/9 and no VCS.

Sorry if it sounds trivial, I'm just not sure where to start looking
into...

Thanks/Regards
Tihomir


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] connectivity delays

2007-11-28 Thread Jim Senicka

What address to you telnet to?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tihomir
Cavuzic
Sent: Wednesday, November 28, 2007 4:43 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] connectivity delays

Hello,

Let me introduce my little connectivity question, maybe VCS-related:

VCS 4.1, 2 Netras 440 with Solaris 10, 5 service groups, one of them is
network. Config files attached. The problem is that often we experience
connectivity delays which are demonstrated for instance by telnet
hold-ups, temporary outages of diameter links and similar, all in
duration of couple of seconds. As soon as it is over, everything goes
back to normal, telnet buffer is emptied, diameter links are up again
automatically etc.

Is there any chance this could have something to do with VCS, or I
should be looking only to Solaris, switch (port) configuration and
ethernet interfaces on my Solaris boxes? I ask this since many boxes are
connected to the same switch, switch ports are uniformly configured, and
still only my machines have trouble with delays. The only differece is
that only my machines have VCS and Solaris 10 -- all the others have
Solaris 8/9 and no VCS.

Sorry if it sounds trivial, I'm just not sure where to start looking
into...

Thanks/Regards
Tihomir

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Interconnect hardware specifications

2007-11-27 Thread Jim Senicka

A switch? No.
2 switches? Ok.

We would be looking for 100BaseT or Gigabit, full duplex. Not so much
from a bandwidth standpoint, just reliability. Full duplex removes
collision issues.

No problems with dedicated switches per interconnect network 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stefhen
Hovland
Sent: Tuesday, November 27, 2007 3:02 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Interconnect hardware specifications

Does anyone have any information location as to a minimum hardware type
to be used for VCS interconnects? We have some production boxes running
with a Linksys switch in between the hosts and I would like to know for
sure if this is a good idea or not.


Thanks,
Stefhen
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error

2007-11-13 Thread Jim Senicka

Sorry Randy,
that was not a case of saying "dunno". HAD not heartbeating GAB is
usually indicative of a system load issue or something blocking the
ability of HAD to open necessary lock files. These are general
statements as this can happen on any environment and should be easy to
track down.

Specific questions, or more difficult to solve issues need to be opened
as a support case.
This is a general discussion forum, not a support avenue for VCS.
Since the support guys have access to explorer output, core files, and
far more day to day experience, they can answer far better.
 
 


From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Randy
Slead
Sent: Tuesday, November 13, 2007 2:43 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error


I have seen this on all version of VCS (4/5) even at 10% system
utilization.  And Symantec going "I dunno", is not helpful.

Jim Senicka <[EMAIL PROTECTED]> wrote: 

HAD is not talking to GAB.
Excessive system utilization, or a blocked /var file system or
some such issue.
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marianne
Van Den Berg
Sent: Tuesday, November 13, 2007 1:17 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check"
error


Hi all
 
Brand new installation - 2-node cluster, Solaris 9 with latest
O/S patches, SF/HA 5.0 with MP1.
 
IPMultiNICB config'ed as parallel sg (using mpathd) and
ClusterService group.
 
Getting these errors about 3 minutes after hastart.   Any
ideas??
 
/var/adm/messages:
 
Nov 13 15:59:11 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 7 sec
Nov 13 15:59:12 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 8 sec
Nov 13 15:59:13 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 9 sec
Nov 13 15:59:14 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 10 sec
Nov 13 15:59:15 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS
WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD
heartbeat to GAB (10 seconds)
Nov 13 15:59:15 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 11 sec
Nov 13 15:59:16 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 12 sec
Nov 13 15:59:17 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 13 sec
Nov 13 15:59:18 drp-db-1 gab: [ID 272231 kern.notice] GAB
WARNING V-15-1-20057 Port h process 140 inactive 14 sec
Nov 13 15:59:19 drp-db-1 gab: [ID 191522 kern.notice] GAB
WARNING V-15-1-20058 Port h process 140: heartbeat failed, killing
process
Nov 13 15:59:19 drp-db-1 gab: [ID 975177 kern.notice] GAB INFO
V-15-1-20059 Port h heartbeat interval 15000 msec. Statistics:
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 0 ~ 3000 msec: 3869
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 3000 ~ 6000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 6000 ~ 9000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 9000 ~ 12000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 12000 ~ 15000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 259915 kern.notice] GAB INFO
V-15-1-20094 number of processes:   158
Nov 13 15:59:19 drp-db-1 gab: [ID 631272 kern.notice] GAB INFO
V-15-1-20095 load average in 1 min: 0. 6
Nov 13 15:59:19 drp-db-1 gab: [ID 587815 kern.notice] GAB INFO
V-15-1-20096 load average in 5 min: 0. 8
Nov 13 15:59:19 drp-db-1 gab: [ID 980060 kern.notice] GAB INFO
V-15-1-20097 load average in 15 min:0.10
Nov 13 15:59:19 drp-db-1 gab: [ID 559196 kern.notice] GAB INFO
V-15-1-20098 pagein rate:   0
Nov 13 15:59:19 drp-db-1 gab: [ID 582491 kern.notice] GAB INFO
V-15-1-20099 pageout rate:  0
Nov 13 15:59:19 drp-db-1 gab: [ID 940236 kern.notice] GAB INFO
V-15-1-20041 Port h: client process failure: killing process
Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS
WARNING V-16-1-53034 HAD Signal SIGABRT received
Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS
NOTICE V-16-1-

Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error

2007-11-13 Thread Jim Senicka

HAD is not talking to GAB.
Excessive system utilization, or a blocked /var file system or some such
issue.
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marianne
Van Den Berg
Sent: Tuesday, November 13, 2007 1:17 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SF/HA 5.0 on Solaris 9: "HAD Self Check" error


Hi all
 
Brand new installation - 2-node cluster, Solaris 9 with latest O/S
patches, SF/HA 5.0 with MP1.
 
IPMultiNICB config'ed as parallel sg (using mpathd) and ClusterService
group.
 
Getting these errors about 3 minutes after hastart.   Any ideas??
 
/var/adm/messages:
 
Nov 13 15:59:11 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 7 sec
Nov 13 15:59:12 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 8 sec
Nov 13 15:59:13 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 9 sec
Nov 13 15:59:14 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 10 sec
Nov 13 15:59:15 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING
V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB
(10 seconds)
Nov 13 15:59:15 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 11 sec
Nov 13 15:59:16 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 12 sec
Nov 13 15:59:17 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 13 sec
Nov 13 15:59:18 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING
V-15-1-20057 Port h process 140 inactive 14 sec
Nov 13 15:59:19 drp-db-1 gab: [ID 191522 kern.notice] GAB WARNING
V-15-1-20058 Port h process 140: heartbeat failed, killing process
Nov 13 15:59:19 drp-db-1 gab: [ID 975177 kern.notice] GAB INFO
V-15-1-20059 Port h heartbeat interval 15000 msec. Statistics:
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 0 ~ 3000 msec: 3869
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 3000 ~ 6000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 6000 ~ 9000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 9000 ~ 12000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO
V-15-1-20129 Port h: heartbeats in 12000 ~ 15000 msec: 0
Nov 13 15:59:19 drp-db-1 gab: [ID 259915 kern.notice] GAB INFO
V-15-1-20094 number of processes:   158
Nov 13 15:59:19 drp-db-1 gab: [ID 631272 kern.notice] GAB INFO
V-15-1-20095 load average in 1 min: 0. 6
Nov 13 15:59:19 drp-db-1 gab: [ID 587815 kern.notice] GAB INFO
V-15-1-20096 load average in 5 min: 0. 8
Nov 13 15:59:19 drp-db-1 gab: [ID 980060 kern.notice] GAB INFO
V-15-1-20097 load average in 15 min:0.10
Nov 13 15:59:19 drp-db-1 gab: [ID 559196 kern.notice] GAB INFO
V-15-1-20098 pagein rate:   0
Nov 13 15:59:19 drp-db-1 gab: [ID 582491 kern.notice] GAB INFO
V-15-1-20099 pageout rate:  0
Nov 13 15:59:19 drp-db-1 gab: [ID 940236 kern.notice] GAB INFO
V-15-1-20041 Port h: client process failure: killing process
Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING
V-16-1-53034 HAD Signal SIGABRT received
Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE
V-16-1-53038 Beginning execution of the diagnostics script
Nov 13 15:59:21 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE
V-16-1-53039 Completed execution of the diagnostics script
Nov 13 15:59:22 drp-db-1 gab: [ID 397130 kern.notice] GAB INFO
V-15-1-20032 Port h closed
Nov 13 15:59:22 drp-db-1 syslog[29181]: [ID 702911 daemon.notice] VCS
ERROR V-16-1-11103 VCS exited. It will restart
 
 
 
had restarts, but the same thing happens again after a couple of
minutes.
 
Regards
 
Marianne
 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] five clusters in one global cluster?

2007-11-12 Thread Jim Senicka

We do not have plans for more than 4. Since GCO connectivity is only needed for 
direct cluster to cluster swap of a specific service group, we have not seen a 
situation where one cluster has a replicated relationship with more than 3 
other clusters.  For managing DR, the VCS Management Console can handle any 
number of clusters.  



Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Pavel A Tsvetkov [mailto:[EMAIL PROTECTED]
Sent:   Monday, November 12, 2007 03:01 AM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] five clusters in one global cluster?

Hello all!

Do Symantec plan to implement five clusters in one global cluster?

Thanks!

Regards Pavel Tsvetkov

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] vxio + cpu panic

2007-10-27 Thread Jim Senicka

Please open a support case.

Also, this is the VCS list. VxVM and VxFS go under veritas-vx
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Örs Tiszay O
Sent: Saturday, October 27, 2007 6:09 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] vxio + cpu panic

Hi,

I have a cluster of 2 SUN Fire T2000 servers running Solaris 10, with VRTSvcs 
4.1MP1 and VRTSvxvm VERITAS-4.1_MP1_PointPatch1.2:2006-03-01. Not quite sure 
what I did last, but one of the servers keeps panicing on boot. It always dies 
at the same point, after vxfen startup. This is what I have on the console:


Oct 27 23:06:58 xxx vxfen: NOTICE: VCS FEN INFO V-11-1-35 Fencing driver going 
into RUNNING state Oct 27 23:07:04 xxx su: pam_unix_cred: project.max-sem-ids 
resource control assignment failed for project "group.oinstall"
Oct 27 23:07:06 xxx vxfen: NOTICE: VCS FEN INFO V-11-1-34 The ioctl 
VXFEN_IOWARNING: VxVM vxio V-5-0-357 volcvm_gab_join: GLM_API_REGISTER failed: 2

panic[cpu19]/thread=2a1021ddcc0: mutex_enter: bad mutex, lp=70502c80 
owner=2a102bbdcc0 thread=2a1021ddcc0

02a1033516d0 vxglm:vxg_api_unregister+30 (70503ca0, 10, 8, 2, 20, 185f1cc)
  %l0-3: 0140 70035c00  
  %l4-7: 70052910 70033518 70033528 700561f0 
02a103351780 vxio:volcvm_reservegab_leave+34 (70033528, 70033520, 70033538, 
1, 0, 0)
  %l0-3: 7af484e8 70035c00  
  %l4-7: 70052910 70033518 70033528 700561f0 
02a103351830 vxio:volcvm_gab_joindone+150 (70052910, 10, 8, 2, 20, 185f1cc)
  %l0-3: 7af481e8 70035c00  
  %l4-7: 70052910 70033518 70033528 700561f0 
02a1033518e0 vxio:volcvm_gab_join+7a4 (60001832600, 2, 70033538, 1, 0, 0)
  %l0-3: 7af481e8 70035c00  
  %l4-7: 70052910 70033518 70033528 700561f0 
02a103351a10 vxio:dmp_quiesce_sio+be1fed0 (0, 0, 183cf40, 183cf40, 
300017e8000, 0)
  %l0-3:  70035eec  
  %l4-7:    

Could anyone gime me some pointers where to start looking? 


tia,
Ors

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Adding a LUN in Veritas Cluster

2007-10-09 Thread Jim Senicka

Nothing. Unless you use volume resources in the dependency tree

Sent from my Nokia E62 handheld by goodlink.

 -Original Message-
From:   Artur Baruchi [mailto:[EMAIL PROTECTED]
Sent:   Tuesday, October 09, 2007 05:46 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] Adding a LUN in Veritas Cluster

Hi,

After the server recognize a LUN, what is the steps to add these LUNs
in Veritas Cluster, I already have a VG that is shared.

Thanks,
Artur Baruchi
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] OnlineRetryLimit weird behaviour

2007-10-09 Thread Jim Senicka

Ok,
I was wrong here.
This is available at group level.
Let me do some digging



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim
Senicka
Sent: Tuesday, October 09, 2007 12:37 PM
To: Gurugunti, Mahesh; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] OnlineRetryLimit weird behaviour


Online Retry Limit sets how many times to attempt to online a resource
when initial attempt fails.
This is not a service group setting



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Gurugunti, Mahesh
Sent: Tuesday, October 09, 2007 11:54 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] OnlineRetryLimit weird behaviour


I set OnlineRetryLimit = 1 for a service group, the service group keeps
on restarting more that once inspite of this setting.
 
Any ideas?
 
Mahesh




-
The information contained in this transmission may be privileged and
confidential and is intended only for the use of the person(s) named
above. If you are not the intended recipient, or an employee or agent
responsible
for delivering this message to the intended recipient, any review,
dissemination,
distribution or duplication of this communication is strictly
prohibited. If you are
not the intended recipient, please contact the sender immediately by
reply e-mail
and destroy all copies of the original message. Please note that we do
not accept
account orders and/or instructions by e-mail, and therefore will not be
responsible
for carrying out such orders and/or instructions.  If you, as the
intended recipient
of this message, the purpose of which is to inform and update our
clients, prospects
and consultants of developments relating to our services and products,
would not
like to receive further e-mail correspondence from the sender, please
"reply" to the
sender indicating your wishes.  In the U.S.: 1345 Avenue of the
Americas, New York,
NY 10105.   
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] OnlineRetryLimit weird behaviour

2007-10-09 Thread Jim Senicka

Online Retry Limit sets how many times to attempt to online a resource
when initial attempt fails.
This is not a service group setting



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Gurugunti, Mahesh
Sent: Tuesday, October 09, 2007 11:54 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] OnlineRetryLimit weird behaviour


I set OnlineRetryLimit = 1 for a service group, the service group keeps
on restarting more that once inspite of this setting.
 
Any ideas?
 
Mahesh




-
The information contained in this transmission may be privileged and
confidential and is intended only for the use of the person(s) named
above. If you are not the intended recipient, or an employee or agent
responsible
for delivering this message to the intended recipient, any review,
dissemination,
distribution or duplication of this communication is strictly
prohibited. If you are
not the intended recipient, please contact the sender immediately by
reply e-mail
and destroy all copies of the original message. Please note that we do
not accept
account orders and/or instructions by e-mail, and therefore will not be
responsible
for carrying out such orders and/or instructions.  If you, as the
intended recipient
of this message, the purpose of which is to inform and update our
clients, prospects
and consultants of developments relating to our services and products,
would not
like to receive further e-mail correspondence from the sender, please
"reply" to the
sender indicating your wishes.  In the U.S.: 1345 Avenue of the
Americas, New York,
NY 10105.   
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] change cluster node

2007-09-11 Thread Jim Senicka

hagrp -switch command.
 
I would highly recommend taking the 5 day VCS users course offered by my
colleagues in Symantec Education. They do a great job of getting you up
to speed on the technology as well as all basic operational matters such
as this.
 
 



From: upen [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 11, 2007 11:10 AM
To: Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] change cluster node


Thanks Jim,

I am unaware of the clustering works but I was suggested by colleagues
to make it to node2.

So, Cluster Service is the controlling node for the Cluster web
services. but can I switch the Cluster service to node2 ? and how ? 

Thanks,


On 9/11/07, Jim Senicka <[EMAIL PROTECTED]> wrote: 

cluster service group is a VCS thing. It will not effect your
app at all, and does not need to be running for your application to run.
It is there for the Web UI and to host the connector if GCO is
configured



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of upen
Sent: Tuesday, September 11, 2007 10:45 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] change cluster node



Hi,

Following is the result of hastatus -summary.

I want to make both groups on any one node but with least down
time if at all.

bb is service group while ClusterService is cluster group.

how to change ClusterService on node2 ONLINE and ClusterService
on OFFLINE so that both service groups will be on single node. 

Also will this involve any application services downtime ?


 hastatus -summary

-- SYSTEM STATE
-- System   StateFrozen


A  node1RUNNING  0
A  node2RUNNING  0

-- GROUP STATE
-- Group   System   Probed AutoDisabled
State  

B  ClusterService  node1Y  N
ONLINE 
B  ClusterService  node2Y  N
OFFLINE
B  bb  node1Y  N
OFFLINE
B  bb  node2Y  N
ONLINE 


I am new to VCS, so please help with complete commands.

Thanks in advance.

- 
upen,
emerge -uD life (Upgrade Life with dependencies) 




-- 
upen,
emerge -uD life (Upgrade Life with dependencies) 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] change cluster node

2007-09-11 Thread Jim Senicka

cluster service group is a VCS thing. It will not effect your app at
all, and does not need to be running for your application to run. It is
there for the Web UI and to host the connector if GCO is configured

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of upen
Sent: Tuesday, September 11, 2007 10:45 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] change cluster node

Hi,

Following is the result of hastatus -summary.

I want to make both groups on any one node but with least down time if
at all.

bb is service group while ClusterService is cluster group.

how to change ClusterService on node2 ONLINE and ClusterService on
OFFLINE so that both service groups will be on single node. 

Also will this involve any application services downtime ?

 hastatus -summary

-- SYSTEM STATE
-- System   StateFrozen  

A  node1RUNNING  0
A  node2RUNNING  0

-- GROUP STATE
-- Group   System   Probed AutoDisabledState

B  ClusterService  node1Y  N   ONLINE

B  ClusterService  node2Y  N   OFFLINE

B  bb  node1Y  N   OFFLINE

B  bb  node2Y  N   ONLINE 

I am new to VCS, so please help with complete commands.

Thanks in advance.

- 
upen,
emerge -uD life (Upgrade Life with dependencies) 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Conversion from Assymetric to Symmetric VCS Cluster

2007-09-11 Thread Jim Senicka

Add a second service group and set its auto start list to have node B
first

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Shivalingam Vanam
Sent: Tuesday, September 11, 2007 9:38 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Conversion from Assymetric to Symmetric VCS
Cluster

Hi, 

  Can some one point me to the documentation on the subject matter? We
would like to create a new SG ono "B" node by doing so.

Thanks
VSL

More photos; more messages; more whatever - Get MORE with Windows
Live(tm) Hotmail(r). NOW with 5GB storage. Get more!

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] regarding veritas ha and apache logs from blackboard

2007-09-05 Thread Jim Senicka

This is pretty much an apache issue, not VCS. If you need to bounce apache to 
make it happen, you would simply freeze the service group while doing so to 
keep VCS from reacting, or use VCS to stop/start apache. 
As for the command to clear logs, I cannot help you there. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   upen [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, September 05, 2007 12:21 PM US Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] regarding veritas ha and apache logs from 
blackboard

Hi we are using blackboard application with apache 1.3.33 version on
Sun nodes on veritas ha cluster.

I was told that if apache logs increase beyond 2 GB the Blackboard
application misbehaves.

How can I clear the logs or transfer and make 0 size . Please let me
know the procedure to have a minimum downtime with application
services.

Thanks in advance
upendra


-- 
upen,
emerge -uD life (Upgrade Life with dependencies)
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] what shall "monitor" do on nodes not running a SG ?

2007-08-16 Thread Jim Senicka

Good point Eric.
"Unexpected Offline" will be logged by the engine. The fact that the monitor 
determines offline is not relevant in context of the monitor and should not be 
logged
 

-Original Message-
From: Eric Hennessey 
Sent: Thursday, August 16, 2007 2:23 PM
To: Charles Bueche; Jim Senicka
Cc: Gene Henriksen; veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] what shall "monitor" do on nodes not running a SG ?

Better yet, you should not issue this "error" at all.

Think of the monitor entry point as non-judgmental.  It merely reports whether 
a resource is online or offline on a given node.  If $HTTPDCONF is not present, 
that's not an error condition on a node where the service group isn't presently 
online.

The rule, then, is don't issue log messages when you determine a resource is 
offline.  You'll be issuing log messages every 5 minutes on the offline nodes. 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charles Bueche
Sent: Thursday, August 16, 2007 9:54 AM
To: Jim Senicka
Cc: Gene Henriksen; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] what shall "monitor" do on nodes not running a SG ?

Hi,

You put me on the right track. I was doing this :

---
if [ ! -f "$HTTPDCONF" ] ; then
 VCSAG_LOG_MSG "C" "$HTTPDCONF missing, aborting" 1
 exit 100
fi
---

and of course $HTTPDCONF isn't available on node not running the service.

I should change the level to "notice" or "info".

Thanks for the help,
Charles


On 16 août 07, at jeudi, 16 août 2007 | 15:23, Jim Senicka wrote:

> The monitor should return offline (100) when offline.
> It actually sounds like the monitor itself is failing.
> Can you post the actual log message, and the actual monitor script?
>
> -Original Message-
> From: Charles Bueche [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 16, 2007 9:20 AM
> To: Jim Senicka; Gene Henriksen
> Cc: veritas-ha@mailman.eng.auburn.edu
> Subject: Re: [Veritas-ha] what shall "monitor" do on nodes not running 
> a SG ?
>
> Hi,
>
> sorry, my explanation wasn't good. I try once again :
>
> 1. when a node runs a SG and its application, the local "monitor"  
> run by VCS returns "ONLINE, 110". This is normal.
>
> 2. when a node doesn't run a SG and its application (because it is 
> running on another node), the local "monitor" run by VCS returns 
> "OFFLINE, 100". It fills the log with "CRITICAL" errors, even if this 
> is a normal situation.
>
> In case (2), I can of course detect whether I'm the active SG for the 
> application, and return different values if yes or no. What is the 
> recommended practice in this case ?
>
> Charles
>
> On 16 août 07, at jeudi, 16 août 2007 | 13:45, Jim Senicka wrote:
>
>> The monitor should return offline when the app is offline on the node 
>> where the monitor is running.
>> Your monitor appears to not run when the app is not running.
>> Where is the monitor routine stored?
>> Does it utilize functions that are only available when the app 
>> storage is present?
>>
>>
>>
>> -Original Message-
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of 
>> Charles Bueche
>> Sent: Thursday, August 16, 2007 7:29 AM
>> To: veritas-ha@mailman.eng.auburn.edu
>> Subject: [Veritas-ha] what shall "monitor" do on nodes not running a 
>> SG ?
>>
>> Hi,
>>
>> we have a 4-nodes cluster running on linux. We develop our own 
>> scripts to use with the VCS Application Agent. The scripts have the 
>> usual start/stop/clean/monitor functions.
>>
>> What should the "monitor" function do when the service to check is 
>> running on another node ?
>>
>> If I do nothing, it return with :
>>
>> VCSAG_LOG_MSG "C" "Checking $APPNAME on $HOSTNAME failed, 
>> OFFLINE" 7
>> exit 100
>>
>> The log then fills up with "...VCS CRITICAL V-16-2-7 
>> __:Checking  on  failed, OFFLINE"
>>
>> Should I identify if the SG is local and just return somthing like
>>
>>  VCSAG_LOG_MSG "I" "$APPNAME not running on 
>> $HOSTNAME" 7
>>  exit ???
>>
>> but then, what exit code is appropriate ?
>>
>> TIA,
>> Charles
>> ___
>> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] what shall "monitor" do on nodes not running a SG ?

2007-08-16 Thread Jim Senicka

The monitor should return offline (100) when offline.
It actually sounds like the monitor itself is failing.
Can you post the actual log message, and the actual monitor script?

-Original Message-
From: Charles Bueche [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 16, 2007 9:20 AM
To: Jim Senicka; Gene Henriksen
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] what shall "monitor" do on nodes not running a SG ?

Hi,

sorry, my explanation wasn't good. I try once again :

1. when a node runs a SG and its application, the local "monitor" run by VCS 
returns "ONLINE, 110". This is normal.

2. when a node doesn't run a SG and its application (because it is running on 
another node), the local "monitor" run by VCS returns "OFFLINE, 100". It fills 
the log with "CRITICAL" errors, even if this is a normal situation.

In case (2), I can of course detect whether I'm the active SG for the 
application, and return different values if yes or no. What is the recommended 
practice in this case ?

Charles

On 16 août 07, at jeudi, 16 août 2007 | 13:45, Jim Senicka wrote:

> The monitor should return offline when the app is offline on the node 
> where the monitor is running.
> Your monitor appears to not run when the app is not running.
> Where is the monitor routine stored?
> Does it utilize functions that are only available when the app storage 
> is present?
>
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Charles Bueche
> Sent: Thursday, August 16, 2007 7:29 AM
> To: veritas-ha@mailman.eng.auburn.edu
> Subject: [Veritas-ha] what shall "monitor" do on nodes not running a 
> SG ?
>
> Hi,
>
> we have a 4-nodes cluster running on linux. We develop our own scripts
> to use with the VCS Application Agent. The scripts have the usual
> start/stop/clean/monitor functions.
>
> What should the "monitor" function do when the service to check is
> running on another node ?
>
> If I do nothing, it return with :
>
> VCSAG_LOG_MSG "C" "Checking $APPNAME on $HOSTNAME failed,  
> OFFLINE" 7
> exit 100
>
> The log then fills up with "...VCS CRITICAL V-16-2-7
> __:Checking  on  failed, OFFLINE"
>
> Should I identify if the SG is local and just return somthing like
>
>  VCSAG_LOG_MSG "I" "$APPNAME not running on  
> $HOSTNAME" 7
>  exit ???
>
> but then, what exit code is appropriate ?
>
> TIA,
> Charles
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

-- 
Charles Bueche
netnea ag, www.netnea.com
gsm +41 79 330 0070



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] what shall "monitor" do on nodes not running a SG ?

2007-08-16 Thread Jim Senicka

The monitor should return offline when the app is offline on the node
where the monitor is running.
Your monitor appears to not run when the app is not running.
Where is the monitor routine stored?
Does it utilize functions that are only available when the app storage
is present?

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Charles
Bueche
Sent: Thursday, August 16, 2007 7:29 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] what shall "monitor" do on nodes not running a SG
?

Hi,

we have a 4-nodes cluster running on linux. We develop our own scripts
to use with the VCS Application Agent. The scripts have the usual
start/stop/clean/monitor functions.

What should the "monitor" function do when the service to check is
running on another node ?

If I do nothing, it return with :

VCSAG_LOG_MSG "C" "Checking $APPNAME on $HOSTNAME failed, OFFLINE" 7
exit 100

The log then fills up with "...VCS CRITICAL V-16-2-7
__:Checking  on  failed, OFFLINE"

Should I identify if the SG is local and just return somthing like

 VCSAG_LOG_MSG "I" "$APPNAME not running on $HOSTNAME" 7
 exit ???

but then, what exit code is appropriate ?

TIA,
Charles
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Fw: gab restarts had

2007-08-09 Thread Jim Senicka

A couple points.

- HAD getting recycled by GAB is due to HAD not heart beating with GAB on the 
local box. This has pretty much  zero to do with LLT heartbeat between boxes. 
- HAD not heart beating GAB is indicative of HAD either being swapped out due 
to extreme high load (it runs as a real time process on Solaris) or HAD 
blocking for some reason in an I/O call. This can really only happen if /var is 
full or write protected, as there is some lock file activity there.
- The only way HAD could possibly be effected by physical networks was if it 
was blocking on some piece of data that must be sent, but you would also see 
lots of corresponding LLT alarms.

So, based on what I see here, HAD is not running correctly due to either a 
problem with /var or a load issue.


-Original Message-
From: Peter DrakeUnderkoffler [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 09, 2007 9:26 AM
To: Kiss László - Károly
Cc: Jim Senicka; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Fw: gab restarts had

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

But it shouldn't be halting the system now, gab will still kill had.

Do you have any llt errors or more importantly, any layer 2 errors with the 
heartbeat networks?

How do you know it's not the load?  What are you using to determine that?  What 
do you see in the /var/adm/messages or the output to dmesg a little after this 
is happening.  Those errors are a symptom of a system under too much load, but 
other things can cause that kind of symptom.  You need to actually start 
digging into the O/S layer and figure out what the system is doing.  The 
adjustment I mentioned to gabtab allows you that opportunity.

The other solution is to open a support call with Symantec and let them figure 
out what is going on.

Thanks
Peter

Peter DrakeUnderkoffler
Xinupro, LLC
617-834-2352



Kiss László - Károly wrote:
> Hi,
> I followed your instruction adn edited gabtab which now looks like:
> /sbin/gabconfig -c -k -n2
> 
> but still the HAD is restarted by the gab.
> 
> BR,
> Laszlo
> 
> - Original Message 
> From: Peter DrakeUnderkoffler <[EMAIL PROTECTED]>
> To: Jim Senicka <[EMAIL PROTECTED]>
> Cc: Kiss László - Károly <[EMAIL PROTECTED]>; 
> veritas-ha@mailman.eng.auburn.edu
> Sent: Wednesday, 8 August, 2007 5:33:32 PM
> Subject: Re: [Veritas-ha] Fw: gab restarts had
> 
> I agree with Jim, that is the failure scenario when the system is 
> overloaded and gab isn't able to communicate for a period of time.  As 
> a temporary measure, you can add a "-k" to gabtab and restart gab.  
> This will have it not force the system to panic giving you time to 
> resolve the underlying issue.
> I wouldn't leave this in place though
> 
> Thanks
> Peter
> 
> 
> Peter DrakeUnderkoffler
> Xinupro, LLC
> 617-834-2352
> 
> 
> 
> Jim Senicka wrote:
>>> is the system heavily loaded?
>>> GAB restarts HAD when HAD does not communicate with GAB for 16 seconds.
>>> This usually happens only in super overload situations
>>>
>>> 
>>> 
>>> *From:* [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] *On Behalf Of 
>>> *Kiss László - Károly
>>> *Sent:* Wednesday, August 08, 2007 11:04 AM
>>> *To:* veritas-ha@mailman.eng.auburn.edu
>>> *Subject:* [Veritas-ha] Fw: gab restarts had
>>>
>>> Sorry, I forgot the file :(
>>> Here it is
>>> - Forwarded Message 
>>> From: Kiss László - Károly <[EMAIL PROTECTED]>
>>> To: veritas-ha@mailman.eng.auburn.edu
>>> Sent: Wednesday, 8 August, 2007 5:02:42 PM
>>> Subject: Re: [Veritas-ha] gab restarts had
>>>
>>> Hi,
>>>
>>> We have a two node cluster, VCS 4.1.
>>>
>>> When I try to bring online/offline a resource or when I try to make 
>>> a switchover I get some very strange behaviour. The gab daemon 
>>> restarts the veritas and thus I can't do anything with it.
>>> I checked all the stepps from the install guide Verifying LLT, GAB, 
>>> and Cluster Operation chapter an everything looks fine but when I 
>>> try to do something it just restarts. I attached the complete log of 
>>> a restart, here is a snipet from it:
>>>
>>> Aug  8 22:29:23 NTMS1AN1 gab: [ID 272231 kern.notice] GAB WARNING
>>> V-15-1-20057 Port h process 5182 inactive 14 sec Aug  8 22:29:24 
>>> NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS WARNING
>>> V-16-1-53024 HAD Signal SIGABRT received Aug  8 22:29:24 NTMS1AN1 
>>> Had[5182]: [ID 702911 daemon.alert] VCS NOTICE
>>>

Re: [Veritas-ha] Fw: gab restarts had

2007-08-09 Thread Jim Senicka

Do you have a support case open?
Other than load, I am unaware of anything in VCS 4.1 which can cause HAD to 
block



From: Kiss László - Károly [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 09, 2007 3:19 AM
To: Jim Senicka; Peter DrakeUnderkoffler
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Fw: gab restarts had


VCS 4.1, Sun Solaris 9


- Original Message 
From: Jim Senicka <[EMAIL PROTECTED]>
To: Kiss László - Károly <[EMAIL PROTECTED]>; Peter DrakeUnderkoffler <[EMAIL 
PROTECTED]>
Cc: veritas-ha@mailman.eng.auburn.edu
Sent: Wednesday, 8 August, 2007 6:59:55 PM
Subject: RE: [Veritas-ha] Fw: gab restarts had


what OS/Version and what version of VCS?
Something is blocking HAD ability to heartbeat GAB



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kiss László - 
Károly
Sent: Wednesday, August 08, 2007 11:38 AM
To: Peter DrakeUnderkoffler; Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Fw: gab restarts had


Thanks for both of you!

Looks like the system is not loaded, an oracle and a java app is running on it 
but is not loaded. and this error comes only when I try to do something with 
vcs.
I definitly would not let this in place, that's why I would like to get some 
info, what to do in a siutation like this.
Thanks.


- Original Message 
From: Peter DrakeUnderkoffler <[EMAIL PROTECTED]>
To: Jim Senicka <[EMAIL PROTECTED]>
Cc: Kiss László - Károly <[EMAIL PROTECTED]>; veritas-ha@mailman.eng.auburn.edu
Sent: Wednesday, 8 August, 2007 5:33:32 PM
Subject: Re: [Veritas-ha] Fw: gab restarts had


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I agree with Jim, that is the failure scenario when the system is overloaded
and gab isn't able to communicate for a period of time.  As a temporary
measure, you can add a "-k" to gabtab and restart gab.  This will have it
not force the system to panic giving you time to resolve the underlying issue.
I wouldn't leave this in place though

Thanks
Peter


Peter DrakeUnderkoffler
Xinupro, LLC
617-834-2352



Jim Senicka wrote:
> is the system heavily loaded?
> GAB restarts HAD when HAD does not communicate with GAB for 16 seconds.
> This usually happens only in super overload situations
> 
> 
> *From:* [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] *On Behalf Of *Kiss
> László - Károly
> *Sent:* Wednesday, August 08, 2007 11:04 AM
> *To:* veritas-ha@mailman.eng.auburn.edu
> *Subject:* [Veritas-ha] Fw: gab restarts had
> 
> Sorry, I forgot the file :(
> Here it is
> - Forwarded Message 
> From: Kiss László - Károly <[EMAIL PROTECTED]>
> To: veritas-ha@mailman.eng.auburn.edu
> Sent: Wednesday, 8 August, 2007 5:02:42 PM
> Subject: Re: [Veritas-ha] gab restarts had
> 
> Hi,
> 
> We have a two node cluster, VCS 4.1.
> 
> When I try to bring online/offline a resource or when I try to make a
> switchover I get some very strange behaviour. The gab daemon restarts
> the veritas and thus I can't do anything with it.
> I checked all the stepps from the install guide Verifying LLT, GAB, and
> Cluster Operation chapter an everything looks fine but when I try to do
> something it just restarts. I attached the complete log of a restart,
> here is a snipet from it:
> 
> Aug  8 22:29:23 NTMS1AN1 gab: [ID 272231 kern.notice] GAB WARNING
> V-15-1-20057 Port h process 5182 inactive 14 sec
> Aug  8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS WARNING
> V-16-1-53024 HAD Signal SIGABRT received
> Aug  8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS NOTICE
> V-16-1-53028 Beginning execution of the diagnostics script
> Aug  8 22:29:24 NTMS1AN1 gab: [ID 191522 kern.notice] GAB WARNING
> V-15-1-20058 Port h process 5182: heartbeat failed, killing process
> 
> 
> Thanks.
> 
> BR,
> Laszlo
> 
> 
> 
> Yahoo! Answers - Get better answers from someone who knows. Try it now
> <http://uk.answers.yahoo.com/;_ylc=X3oDMTEydmViNG02BF9TAzIxMTQ3MTcxOTAEc2VjA21haWwEc2xrA3RhZ2xpbmU>.
> 
> 
> 
> Yahoo! Mail is the world's favourite email. Don't settle for less, sign
> up for your free account today
> <http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html>.
> 
> 
> 
> 
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
---

Re: [Veritas-ha] Fw: gab restarts had

2007-08-08 Thread Jim Senicka

what OS/Version and what version of VCS?
Something is blocking HAD ability to heartbeat GAB



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kiss László - 
Károly
Sent: Wednesday, August 08, 2007 11:38 AM
To: Peter DrakeUnderkoffler; Jim Senicka
Cc: veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Fw: gab restarts had


Thanks for both of you!

Looks like the system is not loaded, an oracle and a java app is running on it 
but is not loaded. and this error comes only when I try to do something with 
vcs.
I definitly would not let this in place, that's why I would like to get some 
info, what to do in a siutation like this.
Thanks.


- Original Message 
From: Peter DrakeUnderkoffler <[EMAIL PROTECTED]>
To: Jim Senicka <[EMAIL PROTECTED]>
Cc: Kiss László - Károly <[EMAIL PROTECTED]>; veritas-ha@mailman.eng.auburn.edu
Sent: Wednesday, 8 August, 2007 5:33:32 PM
Subject: Re: [Veritas-ha] Fw: gab restarts had


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I agree with Jim, that is the failure scenario when the system is overloaded
and gab isn't able to communicate for a period of time.  As a temporary
measure, you can add a "-k" to gabtab and restart gab.  This will have it
not force the system to panic giving you time to resolve the underlying issue.
I wouldn't leave this in place though

Thanks
Peter


Peter DrakeUnderkoffler
Xinupro, LLC
617-834-2352



Jim Senicka wrote:
> is the system heavily loaded?
> GAB restarts HAD when HAD does not communicate with GAB for 16 seconds.
> This usually happens only in super overload situations
> 
> 
> *From:* [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] *On Behalf Of *Kiss
> László - Károly
> *Sent:* Wednesday, August 08, 2007 11:04 AM
> *To:* veritas-ha@mailman.eng.auburn.edu
> *Subject:* [Veritas-ha] Fw: gab restarts had
> 
> Sorry, I forgot the file :(
> Here it is
> - Forwarded Message 
> From: Kiss László - Károly <[EMAIL PROTECTED]>
> To: veritas-ha@mailman.eng.auburn.edu
> Sent: Wednesday, 8 August, 2007 5:02:42 PM
> Subject: Re: [Veritas-ha] gab restarts had
> 
> Hi,
> 
> We have a two node cluster, VCS 4.1.
> 
> When I try to bring online/offline a resource or when I try to make a
> switchover I get some very strange behaviour. The gab daemon restarts
> the veritas and thus I can't do anything with it.
> I checked all the stepps from the install guide Verifying LLT, GAB, and
> Cluster Operation chapter an everything looks fine but when I try to do
> something it just restarts. I attached the complete log of a restart,
> here is a snipet from it:
> 
> Aug  8 22:29:23 NTMS1AN1 gab: [ID 272231 kern.notice] GAB WARNING
> V-15-1-20057 Port h process 5182 inactive 14 sec
> Aug  8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS WARNING
> V-16-1-53024 HAD Signal SIGABRT received
> Aug  8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS NOTICE
> V-16-1-53028 Beginning execution of the diagnostics script
> Aug  8 22:29:24 NTMS1AN1 gab: [ID 191522 kern.notice] GAB WARNING
> V-15-1-20058 Port h process 5182: heartbeat failed, killing process
> 
> 
> Thanks.
> 
> BR,
> Laszlo
> 
> 
> 
> Yahoo! Answers - Get better answers from someone who knows. Try it now
> <http://uk.answers.yahoo.com/;_ylc=X3oDMTEydmViNG02BF9TAzIxMTQ3MTcxOTAEc2VjA21haWwEc2xrA3RhZ2xpbmU>.
> 
> 
> 
> Yahoo! Mail is the world's favourite email. Don't settle for less, sign
> up for your free account today
> <http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html>.
> 
> 
> 
> 
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGueJMl+lekZRM55oRAlHCAKCphUmbjZPjOGoPIJPqLhUvxrMiJQCeM4j5
TkSvq1fjh7bB6GHtHmKFCZc=
=/UyC
-END PGP SIGNATURE-





Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for 
your free account today 
<http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html> 
.
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Fw: gab restarts had

2007-08-08 Thread Jim Senicka

is the system heavily loaded?
GAB restarts HAD when HAD does not communicate with GAB for 16 seconds.
This usually happens only in super overload situations

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kiss László - 
Károly
Sent: Wednesday, August 08, 2007 11:04 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Fw: gab restarts had

Sorry, I forgot the file :(
Here it is

- Forwarded Message 
From: Kiss László - Károly <[EMAIL PROTECTED]>
To: veritas-ha@mailman.eng.auburn.edu
Sent: Wednesday, 8 August, 2007 5:02:42 PM
Subject: Re: [Veritas-ha] gab restarts had

Hi,

We have a two node cluster, VCS 4.1.

When I try to bring online/offline a resource or when I try to make a 
switchover I get some very strange behaviour. The gab daemon restarts the 
veritas and thus I can't do anything with it. 
I checked all the stepps from the install guide Verifying LLT, GAB, and Cluster 
Operation chapter an everything looks fine but when I try to do something it 
just restarts. I attached the complete log of a restart, here is a snipet from 
it:

Aug  8 22:29:23 NTMS1AN1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 
Port h process 5182 inactive 14 sec
Aug  8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS WARNING 
V-16-1-53024 HAD Signal SIGABRT received
Aug  8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS NOTICE 
V-16-1-53028 Beginning execution of the diagnostics script
Aug  8 22:29:24 NTMS1AN1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 
Port h process 5182: heartbeat failed, killing process

Thanks.

BR,
Laszlo

Yahoo! Answers - Get better answers from someone who knows. Try it now 

 .

Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for 
your free account today 

.
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLT messages

2007-07-23 Thread Jim Senicka

Usually an excessively high load on the box causing LLT to not get
scheduled properly
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Evsyukov, Sergey
Sent: Monday, July 23, 2007 2:10 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] LLT messages


Hello colleagues,
can anybody say, what does mean messages (see below)? 
OS - Solaris9
VCS 4.1
 
Thank you very much, Sergey
 
Jul 21 22:37:58 sf25ka llt: [ID 678236 kern.notice] LLT INFO
V-14-1-10035 timer not called for 120 ticks
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Licensing for Storage Foundation Enterprise HA, Cluster File System, Solaris, v5.0

2007-07-16 Thread Jim Senicka

Talk with your rep.
Pricing is available per processor or per system.
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Rongsheng Fang
Sent: Monday, July 16, 2007 11:44 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Licensing for Storage Foundation Enterprise HA,
Cluster File System, Solaris, v5.0

Does anybody know how Symantec licenses the following product?

Storage Foundation Enterprise HA, Cluster File System, Solaris, v5.0

Is it by "per processor", or "per system", or "per cluster"?

Thanks,

Rongsheng
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLTand GAB problem after first rebooting whenconfigured

2007-06-21 Thread Jim Senicka

LLT is not starting right?  All other data is non relevant. Fix the llt issue 
so gab can start so had can start


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Damodharan K [mailto:[EMAIL PROTECTED]
Sent:   Thursday, June 21, 2007 04:06 PM Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] LLTand GAB problem after first rebooting 
whenconfigured


Dear all,

Iam having V480 2 servers with vcs 4.1 and vxvm 4.1

Iam newly building two node cluster. At installation and configuration the
cluster service worked fine. But after reboot the LLT , GAB is not running
and not able to start Cluster service .Please help to slove this issue .Iam
sending configuraion and the engine log


Engine_A.log

2007/04/18 14:13:44 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2007/04/18 14:13:44 VCS ERROR V-16-1-10116 GabHandle::open failed errno =
261
2007/04/18 14:13:44 VCS ERROR V-16-1-11033 GAB open failed. Exiting
2007/04/18 14:13:54 VCS NOTICE V-16-1-11022 VCS engine (had) started
2007/04/18 14:13:54 VCS NOTICE V-16-1-11027 VCS engine startup
arguments=-restar

Configurations

test02-ap: gabconfig -l
GAB Driver Configuration
Driver state : Unconfigured
Partition arbitration: Disabled
Control port seed: Enabled
Halt on process death: Disabled
Missed heartbeat halt: Disabled
Halt on rejoin   : Disabled
Keep on killing  : Disabled
Quorum flag  : Disabled
Restart  : Disabled
Node count   : 2
Disk HB interval (ms): 1000
Disk HB miss count   : 4
IOFENCE timeout (ms) : 15000
Stable timeout (ms)  : 5000



test02-ap:  more /etc/llttab
set-node test02-ap
set-cluster 70
link qfe2 /dev/qfe:2 - ether - -
link qfe7 /dev/qfe:7 - ether - -


test02-ap: more /etc/gabtab
/sbin/gabconfig -c -n2

test02-ap: gabconfig -a
GAB Port Memberships
===

test02-ap: more /etc/gabtab
/sbin/gabconfig -c -n2


test02-ap: more main.cf
include "types.cf"

cluster vcsdev-ap (
UserNames = { admin = bopHojOlpKppNxpJom }
ClusterAddress = "172.25.7.98"
Administrators = { admin }
CredRenewFrequency = 0
UseFence = SCSI3
CounterInterval = 5
)

system test01-ap (
Limits = { Processors = 4 }
)

system test02-ap (
Limits = { Processors = 4 }
)

group ClusterService (
SystemList = { test01-ap = 0, test02-ap = 1 }
AutoStartList = { test01-ap, test02-ap }
FailOverPolicy = Load
AutoStartPolicy = Load
OnlineRetryLimit = 3
OnlineRetryInterval = 120
Load = 4
)

IP webip (
Device = ce0
Address = "172.25.7.98"
NetMask = "255.255.255.248"
)

NIC csgnic (
Device = ce0
)

VRTSWebApp VCSweb (
Critical = 0
AppName = vcs
InstallDir = "/opt/VRTSweb/VERITAS"
TimeForOnline = 5
RestartLimit = 3
)

VCSweb requires webip
webip requires csgnic


// resource dependency tree
//
//  group ClusterService
//  {
//  VRTSWebApp VCSweb
//  {
//  IP webip
//  {
//  NIC csgnic
//  }
//  }
//  }




Damodharan K
Tata Consultancy Services
Mailto: [EMAIL PROTECTED]
Website: http://www.tcs.com
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LLTnot configured error after reboot

2007-06-21 Thread Jim Senicka

Is there actually a device in /dev called /dev/qfe:2 ??

Think this should be /dev/qfe2. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Damodharan K [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, June 20, 2007 10:34 PM Mountain Standard Time
To: robertinoau; veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha]  LLTnot configured error after reboot

hi all


After unloading and loading GAB
Its giving the following error
# gabconfig -c -x
GAB gabconfig ERROR V-15-2-25015 LLT not configured

But the LLT are correctly configured
 test02-ap:  more /etc/llttab
> set-node test02-ap
> set-cluster 70
> link qfe2 /dev/qfe:2 - ether - -
> link qfe7 /dev/qfe:7 - ether - -
> 
> 
> test02-ap: more /etc/gabtab
> /sbin/gabconfig -c -n2
> 
> test02-ap: gabconfig -a
> GAB Port Memberships
>
===
> 
> test02-ap: more /etc/gabtab
> /sbin/gabconfig -c -n2
> 



Damodharan K
Tata Consultancy Services
Mailto: [EMAIL PROTECTED]
Website: http://www.tcs.com



robertinoau <[EMAIL PROTECTED]> 
06/21/2007 05:08 AM

To
Damodharan K <[EMAIL PROTECTED]>, veritas-ha@mailman.eng.auburn.edu
cc

Subject
Re: [Veritas-ha] LLTand GAB problem after first rebooting when configured






Try this:


1.) Unload GAB

# gabconfig -U

2.) Restart GAB.

# gabconfig -c -x

3.) Finally restart HAD.

# hastart

--- Damodharan K <[EMAIL PROTECTED]> wrote:

> 
> Dear all,
> 
> Iam having V480 2 servers with vcs 4.1 and vxvm 4.1
> 
> Iam newly building two node cluster. At installation
> and configuration the
> cluster service worked fine. But after reboot the
> LLT , GAB is not running
> and not able to start Cluster service .Please help
> to slove this issue .Iam
> sending configuraion and the engine log
> 
> 
> Engine_A.log
> 
> 2007/04/18 14:13:44 VCS INFO V-16-1-10125 GAB
> timeout set to 15000 ms
> 2007/04/18 14:13:44 VCS ERROR V-16-1-10116
> GabHandle::open failed errno =
> 261
> 2007/04/18 14:13:44 VCS ERROR V-16-1-11033 GAB open
> failed. Exiting
> 2007/04/18 14:13:54 VCS NOTICE V-16-1-11022 VCS
> engine (had) started
> 2007/04/18 14:13:54 VCS NOTICE V-16-1-11027 VCS
> engine startup
> arguments=-restar
> 
> Configurations
> 
> test02-ap: gabconfig -l
> GAB Driver Configuration
> Driver state : Unconfigured
> Partition arbitration: Disabled
> Control port seed: Enabled
> Halt on process death: Disabled
> Missed heartbeat halt: Disabled
> Halt on rejoin   : Disabled
> Keep on killing  : Disabled
> Quorum flag  : Disabled
> Restart  : Disabled
> Node count   : 2
> Disk HB interval (ms): 1000
> Disk HB miss count   : 4
> IOFENCE timeout (ms) : 15000
> Stable timeout (ms)  : 5000
> 
> 
> 
> test02-ap:  more /etc/llttab
> set-node test02-ap
> set-cluster 70
> link qfe2 /dev/qfe:2 - ether - -
> link qfe7 /dev/qfe:7 - ether - -
> 
> 
> test02-ap: more /etc/gabtab
> /sbin/gabconfig -c -n2
> 
> test02-ap: gabconfig -a
> GAB Port Memberships
>
===
> 
> test02-ap: more /etc/gabtab
> /sbin/gabconfig -c -n2
> 
> 
> test02-ap: more main.cf
> include "types.cf"
> 
> cluster vcsdev-ap (
> UserNames = { admin = bopHojOlpKppNxpJom }
> ClusterAddress = "172.25.7.98"
> Administrators = { admin }
> CredRenewFrequency = 0
> UseFence = SCSI3
> CounterInterval = 5
> )
> 
> system test01-ap (
> Limits = { Processors = 4 }
> )
> 
> system test02-ap (
> Limits = { Processors = 4 }
> )
> 
> group ClusterService (
> SystemList = { test01-ap = 0, test02-ap = 1
> }
> AutoStartList = { test01-ap, test02-ap }
> FailOverPolicy = Load
> AutoStartPolicy = Load
> OnlineRetryLimit = 3
> OnlineRetryInterval = 120
> Load = 4
> )
> 
> IP webip (
> Device = ce0
> Address = "172.25.7.98"
> NetMask = "255.255.255.248"
> )
> 
> NIC csgnic (
> Device = ce0
> )
> 
> VRTSWebApp VCSweb (
> Critical = 0
> AppName = vcs
> InstallDir = "/opt/VRTSweb/VERITAS"
> TimeForOnline = 5
> RestartLimit = 3
> )
> 
> VCSweb requires webip
> webip requires csgnic
> 
> 
> // resource dependency tree
> //
> //  group ClusterService
> //  {
> //  VRTSWebApp VCSweb
> //  {
> //  IP webip
> //  {
> //  NIC csgnic
> //  }
> //  }
> //  }
> 
> 
> 
> 
> Damodharan K
> Tata Consultancy Services
> Mailto: [EMAIL PROTECTED]
> Website: http://www.tcs.com
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to i

Re: [Veritas-ha] LLTnot configured error after reboot

2007-06-21 Thread Jim Senicka

GAB saying LLT not configured means LLT is not running. It is not saying LLT is 
not configured correctly in llttab


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   robertinoau [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, June 20, 2007 11:40 PM Mountain Standard Time
To: Damodharan K; veritas-ha@mailman.eng.auburn.edu
Subject:Re: [Veritas-ha] LLTnot configured error after reboot

Try this:

/etc/rc2.d/S92gab start

Then 

gabconfig -c -x


--- Damodharan K <[EMAIL PROTECTED]> wrote:

> hi all
> 
> 
> After unloading and loading GAB
> Its giving the following error
> # gabconfig -c -x
> GAB gabconfig ERROR V-15-2-25015 LLT not configured
> 
> But the LLT are correctly configured
>  test02-ap:  more /etc/llttab
> > set-node test02-ap
> > set-cluster 70
> > link qfe2 /dev/qfe:2 - ether - -
> > link qfe7 /dev/qfe:7 - ether - -
> > 
> > 
> > test02-ap: more /etc/gabtab
> > /sbin/gabconfig -c -n2
> > 
> > test02-ap: gabconfig -a
> > GAB Port Memberships
> >
>
===
> > 
> > test02-ap: more /etc/gabtab
> > /sbin/gabconfig -c -n2
> > 
> 
> 
> 
> Damodharan K
> Tata Consultancy Services
> Mailto: [EMAIL PROTECTED]
> Website: http://www.tcs.com
> 
> 
> 
> robertinoau <[EMAIL PROTECTED]> 
> 06/21/2007 05:08 AM
> 
> To
> Damodharan K <[EMAIL PROTECTED]>,
> veritas-ha@mailman.eng.auburn.edu
> cc
> 
> Subject
> Re: [Veritas-ha] LLTand GAB problem after first
> rebooting when configured
> 
> 
> 
> 
> 
> 
> Try this:
> 
> 
> 1.) Unload GAB
> 
> # gabconfig -U
> 
> 2.) Restart GAB.
> 
> # gabconfig -c -x
> 
> 3.) Finally restart HAD.
> 
> # hastart
> 
> --- Damodharan K <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Dear all,
> > 
> > Iam having V480 2 servers with vcs 4.1 and vxvm
> 4.1
> > 
> > Iam newly building two node cluster. At
> installation
> > and configuration the
> > cluster service worked fine. But after reboot the
> > LLT , GAB is not running
> > and not able to start Cluster service .Please help
> > to slove this issue .Iam
> > sending configuraion and the engine log
> > 
> > 
> > Engine_A.log
> > 
> > 2007/04/18 14:13:44 VCS INFO V-16-1-10125 GAB
> > timeout set to 15000 ms
> > 2007/04/18 14:13:44 VCS ERROR V-16-1-10116
> > GabHandle::open failed errno =
> > 261
> > 2007/04/18 14:13:44 VCS ERROR V-16-1-11033 GAB
> open
> > failed. Exiting
> > 2007/04/18 14:13:54 VCS NOTICE V-16-1-11022 VCS
> > engine (had) started
> > 2007/04/18 14:13:54 VCS NOTICE V-16-1-11027 VCS
> > engine startup
> > arguments=-restar
> > 
> > Configurations
> > 
> > test02-ap: gabconfig -l
> > GAB Driver Configuration
> > Driver state : Unconfigured
> > Partition arbitration: Disabled
> > Control port seed: Enabled
> > Halt on process death: Disabled
> > Missed heartbeat halt: Disabled
> > Halt on rejoin   : Disabled
> > Keep on killing  : Disabled
> > Quorum flag  : Disabled
> > Restart  : Disabled
> > Node count   : 2
> > Disk HB interval (ms): 1000
> > Disk HB miss count   : 4
> > IOFENCE timeout (ms) : 15000
> > Stable timeout (ms)  : 5000
> > 
> > 
> > 
> > test02-ap:  more /etc/llttab
> > set-node test02-ap
> > set-cluster 70
> > link qfe2 /dev/qfe:2 - ether - -
> > link qfe7 /dev/qfe:7 - ether - -
> > 
> > 
> > test02-ap: more /etc/gabtab
> > /sbin/gabconfig -c -n2
> > 
> > test02-ap: gabconfig -a
> > GAB Port Memberships
> >
>
===
> > 
> > test02-ap: more /etc/gabtab
> > /sbin/gabconfig -c -n2
> > 
> > 
> > test02-ap: more main.cf
> > include "types.cf"
> > 
> > cluster vcsdev-ap (
> > UserNames = { admin = bopHojOlpKppNxpJom }
> > ClusterAddress = "172.25.7.98"
> > Administrators = { admin }
> > CredRenewFrequency = 0
> > UseFence = SCSI3
> > CounterInterval = 5
> > )
> > 
> > system test01-ap (
> > Limits = { Processors = 4 }
> > )
> > 
> > system test02-ap (
> > Limits = { Processors = 4 }
> > )
> > 
> > group ClusterService (
> > SystemList = { test01-ap = 0, test02-ap =
> 1
> > }
> > AutoStartList = { test01-ap, test02-ap }
> > FailOverPolicy = Load
> > AutoStartPolicy = Load
> > OnlineRetryLimit = 3
> > OnlineRetryInterval = 120
> > Load = 4
> > )
> > 
> > IP webip (
> > Device = ce0
> > Address = "172.25.7.98"
> > NetMask = "255.255.255.248"
> > )
> > 
> > NIC csgnic (
> > Device = ce0
> > )
> > 
> > VRTSWebApp VCSweb (
> > Critical = 0
> > AppName = vcs
> > InstallDir =
> "/opt/VRTSweb/VERITAS"
> > TimeForOnline = 5
> > RestartLimit = 3
> > )
> > 
> > VCSweb requires webip
> > webip requires csgnic
> > 
> > 
> > // resource dependen

Re: [Veritas-ha] VCS without Storage Foundation Suite

2007-05-29 Thread Jim Senicka

UFS, yes.
SVM, No
ZFS, no.

You would need to do custom agents for SVM and ZFS. Both will be
somewhat problematic from a cluster perspective. ZFS from the standpoint
that once you start breaking down storage into small enough chunks to
failover per application, most of the cool goes away. SVM from the
standpoint that it is just not as clean as the disk group model for
failover.

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steven
Sim
Sent: Tuesday, May 29, 2007 7:48 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS without Storage Foundation Suite

Hi All;

I am looking to setup a VCS cluster without the underlying Storage
Foundation Suite, either with ZFS or with SVM/ufs.

Are there ready made and supported agents for both the above?

Warmest Regards
Steven Sim




Fujitsu Asia Pte. Ltd.
_

This e-mail is confidential and may also be privileged. If you are not
the intended recipient, please notify us immediately. You should not
copy or use it for any purpose, nor disclose its contents to any other
person. 

Opinions, conclusions and other information in this message that do not
relate to the official business of my firm shall be understood as
neither given nor endorsed by it.


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS and VMWare question

2007-05-24 Thread Jim Senicka

Assuming you were running VCS inside the guest, then yes this is possible.  So 
this would be running standard vcs for Linux and not the VCS for VMware 
package. 


Sent from my Nokia E62 handheld by goodlink.


 -Original Message-
From:   Pavel A Tsvetkov [mailto:[EMAIL PROTECTED]
Sent:   Thursday, May 24, 2007 06:12 AM Mountain Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] VCS and VMWare question

Hello all!

It is well known that VCS works on VMWare nodes. But the question is: is 
it possible to have on node under VMWare (guest Linux) and the second node 
just on the  Linux
host  without VMWare ?

Thanka a lot!   Pavel.


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Patching VCS 3.5

2007-05-23 Thread Jim Senicka

VCS does not support rolling upgrades in that way. You need to check the
release notes.
 
I would recommend taking VCS down and leaving the apps up while you
patch if at all possible
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul
Hunter
Sent: Wednesday, May 23, 2007 2:34 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Patching VCS 3.5


Will be patching Solaris 8 and VCS 3.5 with
storage_solutions.3.5MP4.sol.tar_278582.gz. This is a 2 node cluster
active/passive VCS has not been patched before.
 
Has anyone run into any problems?
 
Also, I am planning to use installvcspatch script and the passive node,
then failover Then patch the other node. Does anyone see any
issues?
 
Thanks
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] HAD hear beat error

2007-05-17 Thread Jim Senicka

You are running a high enough CPU loading that HAD is not able to
heartbeat with GAB.
HAD runs as a real time process, so when this occurs you have a really
wedged box.
 
 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Damodharan K
Sent: Thursday, May 17, 2007 2:33 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] HAD hear beat error




Dear all, 
Iam Having VCS 4.1 on solaris 10 on 2 v440 servers. 


Any one help iam getting following error in the engine log.Please advice
me what need to do? and how to trouble shoot 


devp-02-th: `May 17 07:28:07 devp-02-th Had[21355]: [ID 702911
daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay
in the HAD heartbeat to GAB (10 seconds) 
May 17 07:28:08 devp-02-th Had[21355]: [ID 702911 daemon.alert] VCS
WARNING V-16-1-53024 HAD Signal SIGABRT received 
May 17 07:28:08 devp-02-th Had[21355]: [ID 702911 daemon.alert] VCS
NOTICE V-16-1-53028 Beginning execution of the diagnostics script 
May 17 07:30:57 devp-02-th Had[22359]: [ID 702911 daemon.alert] VCS
WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD
heartbeat to GAB (10 seconds) 
May 17 07:31:01 devp-02-th Had[22359]: [ID 702911 daemon.alert] VCS
WARNING V-16-1-53024 HAD Signal SIGABRT received 
May 17 07:31:02 devp-02-th Had[22359]: [ID 702911 daemon.alert] VCS
NOTICE V-16-1-53028 Beginning execution of the diagnostics script 
May 17 07:31:14 devp-02-th Had[22359]: [ID 702911 daemon.alert] VCS
NOTICE V-16-1-53029 Completed execution of the diagnostics script 


Thanks
Damodharan K

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] sample for apache application

2007-04-30 Thread Jim Senicka

what OS?
The Linux 5.0 bundled agent reference guide has the Apache agent
documented, and I believe the other OS do as well
 

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of osk
Sent: Monday, April 30, 2007 3:01 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] sample for apache application



Hi,
   I am new to vcs, can you give me one example to configure apache
as resoure.

recommandation are welcome.

regards
Karthikeyan.N
-- 
winners don't do different things
they do things differently 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Naming conventions for VCS; VCS style guide?

2007-04-24 Thread Jim Senicka

comments below

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Colb,
Andrew
Sent: Tuesday, April 24, 2007 2:14 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Naming conventions for VCS; VCS style guide?

All,

We are about to initiate and upgrade several VCS clusters. The plan for
these upgrades will enable us to build/test in parallel with existing
production clusters and  to revisit our traditional cluster nomenclature
and naming conventions.  

Our configuration has a five-node production Solaris Veritas cluster at
our headquarters; we are building a four-node equivalent at our warm
business continuity site (active data replication). The two sites are
connected by a point-to-point DS-3; firewall rules allow one site to see
and interact with the other.

Our current VCS nomenclature is pretty much ad hoc. The new VCS
nomenclature would have the following structure:   stem_object## where
stem is either a functional name (e.g., db)  or a singular, universal
name (e.g., prod), object is dg or sg, and ## is a zero-padded numeric
for serialized differentiation.

Question 1: Can we use identical names for VCS diskgroups and service
groups at the two sites (HQ and Continuity) simultaneously? Host names
will, of course, be different. The clusters will have different cluster
IDs. If we do use identical names, will that create a problem if we move
on to Global Cluster Option and/or to VVR?  For example, if we have a
service group named "db_sg01" in both our headquarters cluster and our
business continuity cluster, will VCS complain? 

JS>> Absolutely. No issues with identical names in separate clusters 

Question 2: Is there an advantage in Veritas management/administration
if all the stem names are the same? That is, if we replace existing stem
names such as db, auth, appsrv, etc with a single universal name such as
"prod", will we be gaining anything in exchange for giving up the
functional association? 

JS>>> Cluster Management console either has or will be providing a
search function, so it all depends on how you want to search :-) 

Thanks in advance for any discussion, advice, ideas, guidance, and
warnings,

Andy Colb

Investment Company Institute

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Resource Group Dependencies

2007-04-24 Thread Jim Senicka

We are not planning to address that in VCS at this time. (Multiple
children).
Please have your account team contact me inside Symantec?

Also, what are you running that a  40 second shutdown is too long?
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ceri
Davies
Sent: Tuesday, April 24, 2007 9:39 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Resource Group Dependencies

I note that there is a restriction that a group may have only one child
group; is there any future in which this might be relaxed?



As a usage case, this is why I want this: I have a multi-node cluster in
which I have multiple zones on each node, and fail applications over
between the zones.

I don't wish to use the configuration quoted in the User's Guide for
zones, as this configuration requires that, when a service group fails
over, that the zone be stopped on the failing node and then started on
the node that the group is failing over to.  This is bad as, in my
testing, starting a zone is very quick, but waiting for one to shut down
takes about 40 seconds.

Therefore, I'm eschewing this and have created a parallel resource group
that starts a zone on each node and have the application resource groups
simply configured with a firm local dependency on the zone resource
group; e.g. with an identically configured zone vleappp on each node, I
use:

  group vleappp_zones (
SystemList = { clna = 0, clnb = 0 }
Parallel = 1
AutoStartList = { clna, clnb }
)

Zone vleappp_zone (
ZoneName = vleappp
)

  group vle_app_prod (
SystemList = { clna = 1, clnb = 0 }
)

Application vleappp_apache (
StartProgram = "/nondistinct/vle/application start"
StopProgram = "/nondistinct/vle/application stop"
PidFiles = {
 
"/zones/local/roots/vleappp/root/nondistinct/vle/logs/httpd .pid" }
ContainerName = vleappp
)

Mount vleappp_mount (
)

Blah otherstuff ()

...

requires group vleappp_zones online local firm

This works perfectly for me, except now I want to add a global
dependency on the vle_ora_prod group as well.  Aargh.

I simply can't wait for the zones to shut down so is there some other
option?

Ceri
--
That must be wonderful!  I don't understand it at all.
  -- Moliere


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Proxy resource in "status unknown"

2007-04-16 Thread Jim Senicka

The thinking is if you configure a resource pointing to an actual
running thing, and it is not configured correctly yet, then you would
not want VCS tearing it down.  

-Original Message-
From: Fred Grieco [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 16, 2007 12:22 PM
To: Fred Grieco; Jim Senicka; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Proxy resource in "status unknown"

This started when I copied the resource from another service group.  

As an aside, is there a reason why resources are defaulted to disabled
and critical when they are created or copied?
--- Fred Grieco <[EMAIL PROTECTED]> wrote:

> Ugh... I didn't see that.  That solved the problem.
> 
> Thanks,
> Fred
> --- Jim Senicka <[EMAIL PROTECTED]> wrote:
> 
> > The actual NIC is not enabled, so the Proxy cannot probe. (at least 
> > that is my first thought here)
> > 
> >  
> > 
> > -Original Message-
> > From: Fred Grieco [mailto:[EMAIL PROTECTED]
> > Sent: Monday, April 16, 2007 11:29 AM
> > To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu
> > Subject: RE: [Veritas-ha] Proxy resource in
> "status
> > unknown"
> > 
> > Here are the snipets from the main.cf.  There are three SGs, one 
> > with the actual NIC resource and two with proxies.
> Both
> > proxies show the
> > "online status unknown" state.
> > 
> > group ClusterService (
> > SystemList = { pa-ocsun-01 = 0,
> pa-ocsun-02
> > =
> > 1 }
> > AutoStartList = { pa-ocsun-01, pa-ocsun-02
> }
> > OnlineRetryLimit = 3
> > OnlineRetryInterval = 120
> > )
> > 
> > IP webip (
> > Device = ce0
> > Address = "192.168.49.146"
> > NetMask = "255.255.255.0"
> > )
> > ...
> > 
> > Proxy NICProxycsg (
> > Critical = 0
> > TargetResName = nic1
> > )
> > 
> > 
> > group VVR-Remote (
> > SystemList = { pa-ocsun-01 = 0,
> pa-ocsun-02
> > =
> > 1 }
> > )
> > ...
> > IP replip (
> > Critical = 0
> > Device = ce0
> > Address = "192.168.49.68"
> > NetMask = "255.255.255.0"
> > )
> > 
> > NIC nic1 (
> > Enabled = 0
> > Device = ce0
> > NetworkType = ether
> > NetworkHosts = { "192.168.49.1" }
> > )
> > 
> > ...
> > 
> > 
> > group oc451 (
> > SystemList = { pa-ocsun-01 = 0,
> pa-ocsun-02
> > =
> > 1 }
> > AutoStartList = { pa-ocsun-01, pa-ocsun-02
> }
> > )
> > ...
> > IP VIP (
> > Critical = 0
> >     Device = ce0
> > Address = "192.168.49.145"
> > NetMask = "255.255.255.0"
> >     )
> > ...
> > Proxy NIC-Proxy (
> > Critical = 0
> > TargetResName = nic1
> > )
> > 
> > ...
> > 
> > 
> > 
> > Fred
> > --- Jim Senicka <[EMAIL PROTECTED]>
> wrote:
> > 
> > > Can you cut/paste main.cf sections?
> > >  
> > > 
> > > -Original Message-
> > > From: Fred Grieco [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, April 16, 2007 9:30 AM
> > > To: Jim Senicka;
> veritas-ha@mailman.eng.auburn.edu
> > > Subject: RE: [Veritas-ha] Proxy resource in
> > "status unknown"
> > > 
> > > Yes, with the same priorities.
> > > 
> > > --- Jim Senicka <[EMAIL PROTECTED]>
> > wrote:
> > > 
> > > > Are the system lists for both service groups
> the
> > > same?
> > > >  
> > > > 
> > > > -Original Message-
> > > > From:
> [EMAIL PROTECTED]
> > > >
> > [mailto:[EMAIL PROTECTED]
> > > > On Behalf Of Fred
> > > > Grieco
> > > > Sent: Monday, April 16, 2007 9:08 AM
> > > > To: veritas-ha@mailman.eng.auburn.edu
> > > > Subject: [Veritas-ha] Proxy resource in
> "status
> > > unknown"
> > > > 
> > > > I've set up a proxy resource that references a
>

Re: [Veritas-ha] Proxy resource in "status unknown"

2007-04-16 Thread Jim Senicka

The actual NIC is not enabled, so the Proxy cannot probe. (at least that
is my first thought here)

 

-Original Message-
From: Fred Grieco [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 16, 2007 11:29 AM
To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] Proxy resource in "status unknown"

Here are the snipets from the main.cf.  There are three SGs, one with
the actual NIC resource and two with proxies.  Both proxies show the
"online status unknown" state.

group ClusterService (
SystemList = { pa-ocsun-01 = 0, pa-ocsun-02 =
1 }
AutoStartList = { pa-ocsun-01, pa-ocsun-02 }
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)

IP webip (
Device = ce0
Address = "192.168.49.146"
NetMask = "255.255.255.0"
)
...

Proxy NICProxycsg (
Critical = 0
TargetResName = nic1
)


group VVR-Remote (
SystemList = { pa-ocsun-01 = 0, pa-ocsun-02 =
1 }
)
...
IP replip (
Critical = 0
Device = ce0
Address = "192.168.49.68"
NetMask = "255.255.255.0"
)

NIC nic1 (
Enabled = 0
Device = ce0
NetworkType = ether
NetworkHosts = { "192.168.49.1" }
)

...


group oc451 (
SystemList = { pa-ocsun-01 = 0, pa-ocsun-02 =
1 }
AutoStartList = { pa-ocsun-01, pa-ocsun-02 }
)
...
IP VIP (
Critical = 0
Device = ce0
Address = "192.168.49.145"
NetMask = "255.255.255.0"
)
...
Proxy NIC-Proxy (
        Critical = 0
TargetResName = nic1
)

...



Fred
--- Jim Senicka <[EMAIL PROTECTED]> wrote:

> Can you cut/paste main.cf sections?
>  
> 
> -Original Message-
> From: Fred Grieco [mailto:[EMAIL PROTECTED]
> Sent: Monday, April 16, 2007 9:30 AM
> To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu
> Subject: RE: [Veritas-ha] Proxy resource in "status unknown"
> 
> Yes, with the same priorities.
> 
> --- Jim Senicka <[EMAIL PROTECTED]> wrote:
> 
> > Are the system lists for both service groups the
> same?
> >  
> > 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED]
> > On Behalf Of Fred
> > Grieco
> > Sent: Monday, April 16, 2007 9:08 AM
> > To: veritas-ha@mailman.eng.auburn.edu
> > Subject: [Veritas-ha] Proxy resource in "status
> unknown"
> > 
> > I've set up a proxy resource that references a NIC
> resource in another
> 
> > service group.  The NIC resource is online, but
> the proxy resource
> > shows "Online|status unknown."
> > 
> > What does this mean in a Proxy resource?  And is
> there any way to
> > clear the unknown status?  This is on a live
> Oracle cluster so I don't
> 
> > have the opportunity to down everything, etc.
> > 
> > TIA,
> > 
> > Fred
> > 
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> ___
> > Veritas-ha maillist  -
> > Veritas-ha@mailman.eng.auburn.edu
> >
>
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
> > 
> > 
> 
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Proxy resource in "status unknown"

2007-04-16 Thread Jim Senicka

Are the system lists for both service groups the same?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Fred
Grieco
Sent: Monday, April 16, 2007 9:08 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Proxy resource in "status unknown"

I've set up a proxy resource that references a NIC resource in another
service group.  The NIC resource is online, but the proxy resource shows
"Online|status unknown."  

What does this mean in a Proxy resource?  And is there any way to clear
the unknown status?  This is on a live Oracle cluster so I don't have
the opportunity to down everything, etc.

TIA,

Fred

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com ___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] VCS 5.0 / Solaris 10 Resource Controls / Oracle Agent

2007-04-11 Thread Jim Senicka

Bryan
Unfortunately, at this time the VCS 5.x agents are pretty much not
designed to work in an SRM environment. We are looking at what it will
take to support this 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bryan
Pepin
Sent: Tuesday, April 10, 2007 4:34 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] VCS 5.0 / Solaris 10 Resource Controls / Oracle
Agent

Hello,

In the process of deploying Oracle 10g on top of SFRAC 5.0 running
Solaris 10, I've noticed the following issues around setting shared
memory parameters for Oracle. The Oracle Agent does not assume the
project that I have assigned to the Oracle user? It is assuming the
system project, and when I try to add the resource controls to that
system or the default project, that does not work either?

Here are the details:

Trying to use Solaris' new project methodology to establish the IPC
tunables, here is what I did:

# projadd -c 'IPC Tunables' -U oracle -G dba -K
'project.max-shm-memory=(privileged,16gb,deny)' user.oracle

Now, as the Oracle user, this allows the DB to open without issue.

However, when I configure the Oracle VCS agent to start the DB, it
appears that the VCS processes are assuming the "system" project, and
when they start the database processes, they are assuming the roles of
that project, rather than those of the oracle user that I have defined?

Here is the error in the messages file when the DB tries to open from
the VCS agent:

[ID 883052 kern.notice] privileged rctl project.max-shm-memory (value
6291603456) exceeded by project 0

So I logically thought I could apply the same tunings to the system
project, but that does not work either.

This is what my project file looks like:

system:0process.max-sem-nsems=(privileged,4096,deny);\
process.max-sem-ops=(privileged,4096,deny);project.max-sem-ids=(privileg
ed,4096,deny);\
project.max-shm-ids=(privileged,512,deny);project.max-shm-memory=(privil
eged,17179869184,deny)
user.root:1
noproject:2
default:3
group.staff:10
user.oracle100:IPC
Tunables:oracle:dba:process.max-sem-nsems=(privileged,4096,deny);\
process.max-sem-ops=(privileged,4096,deny);project.max-sem-ids=(privileg
ed,4096,deny);\
project.max-shm-ids=(privileged,512,deny);project.max-shm-memory=(privil
eged,17179869184,deny)

What I have been able to do is change the parameters on the fly with
prctl:

# ps -ef -o pid,project,args | grep -i OracleAgent --> to get the PID
and Project # prctl -n project.max-shm-memory -i process  --> to
display # prctl -n project.max-shm-memory -r -v 16gb -i process 
--> to set

Once I do that, it allows me to start the database via the Oracle Agent.

Has anyone run into this issue?

This may be me not properly setting up the system project, but I figure
someone must have run into this and they could share how they resolved
it.

I'm hoping there is an easy solution out there, rather than having to
always change the parameter on the running Agent?

Hope that all makes sense.

Thanks.

-Bryan

PS. What I have realized is that if I put the shmmax parameters in the
/etc/system that works, but I was hoping to have to fall back into that
routine.

--

Bryan Pepin
Unix Enterprise Systems

EMC Corporation
4400 Computer Drive
Westboro, MA 01580
508-898-4776
[EMAIL PROTECTED]

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Step-by-Step instructions for adding storage to cluster

2007-04-11 Thread Jim Senicka

You need to get through VCS training to be honest. Without knowing every 
detail, I cannot give exact steps. With basic VCS training this would be 
trivial and you would be fully confident to make the changes


[Sent from my Nokia E62 handheld via Goodlink]


 -Original Message-
From:   Lynette Oliver [mailto:[EMAIL PROTECTED]
Sent:   Tuesday, April 10, 2007 08:04 PM Pacific Standard Time
To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu
Subject:RE: [Veritas-ha] Step-by-Step instructions for adding storage 
to cluster

Thank you for your response, Jim.

Do you have the steps?  I've inherited a VCS configuration but have never
worked on it before. I'm afraid to make changes for fear of creating a
situation that could cause a failover.

 

  _  

From: Jim Senicka [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 10, 2007 7:33 PM
To: Lynette Oliver; veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] Step-by-Step instructions for adding storage to
cluster

 

if you add volumes you will need to add additional volume resources (if you
use volume resources) in the service group, plus whatever additional file
systems you add as additional file system resources

 

Growing file systems requires no changes in the cluster

 

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Lynette
Oliver
Sent: Wednesday, April 11, 2007 12:49 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Step-by-Step instructions for adding storage to
cluster

Hello HA GURUs,

I'm looking for someone to provide me with step-by-step instructions for
adding storage to a cluster.  For example, I have an existing cluster that
requires a new volume group to be added.  I have documentation to indicate
how to create volume groups and volumes using vxvm but nothing that
describes how to integrate this with an existing cluster.  In addition, if I
need to grow a filesystem for a given volume group managed by a cluster, how
do I do so?  Please help. This is VCS 4.1 on Solaris 2.9 running on Hitachi
USP.

 

Thanks,

loliver




___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Step-by-Step instructions for adding storage to cluster

2007-04-10 Thread Jim Senicka

if you add volumes you will need to add additional volume resources (if
you use volume resources) in the service group, plus whatever additional
file systems you add as additional file system resources
 
Growing file systems requires no changes in the cluster

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Lynette
Oliver
Sent: Wednesday, April 11, 2007 12:49 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Step-by-Step instructions for adding storage to
cluster



Hello HA GURUs,

I'm looking for someone to provide me with step-by-step instructions for
adding storage to a cluster.  For example, I have an existing cluster
that requires a new volume group to be added.  I have documentation to
indicate how to create volume groups and volumes using vxvm but nothing
that describes how to integrate this with an existing cluster.  In
addition, if I need to grow a filesystem for a given volume group
managed by a cluster, how do I do so?  Please help. This is VCS 4.1 on
Solaris 2.9 running on Hitachi USP.

 

Thanks,

loliver

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] load-balancing in VCS

2007-04-06 Thread Jim Senicka

No, you will need an IP per node and run a off the shelf IP load balancer out 
front. This is far more standard approach than pumping all traffic through one 
node and let it forward to all others in the cluster.  A serious case of 
marketecture versus real feature on the Sun Cluster side


[Sent from my Nokia E62 handheld via  goodlink]


 -Original Message-
From:   Rongsheng Fang [mailto:[EMAIL PROTECTED]
Sent:   Friday, April 06, 2007 10:29 AM Pacific Standard Time
To: veritas-ha@mailman.eng.auburn.edu
Subject:[Veritas-ha] load-balancing in VCS

Hi,

Does VCS has (or support) the equivalent functionality of Scalable Data 
Service in Sun Cluster, which can balance the load between cluster nodes?

http://docs.sun.com/app/docs/doc/819-0579/6n30dc0nf?a=view

I know that in VCS the service instances can start/run on different 
cluster nodes in parallel mode, but can these service instances share 
the same virtual IP which can only be up on one node?

Thanks,

Rongsheng
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SCSI-3 reservation over VIO with Veritas Volumes onAIX ?

2007-04-06 Thread Jim Senicka

fencing = SCSI3 reserve.
Not possible using VIO.
 

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pavel A
Tsvetkov
Sent: Friday, April 06, 2007 7:04 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SCSI-3 reservation over VIO with Veritas Volumes
onAIX ?



Hello all! 

It is stated in the Release Notes for VCS5.0 for AIX that fencing is not
supported over VIO. What about SCSI-3 reservation? 
Is it possible for veritas disk group to make this kind of reservation
if it gets the disks from VIO ( VIO in that case does not put any
reservation of its own) ? 

 Regards, Pavel
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Custom Agent

2007-04-05 Thread Jim Senicka

If you already have start/stop/monitor,
Take a look at the Application Agent in the BARG. That should cover like
98% of apps

-Original Message-
From: Fred Butler [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 05, 2007 11:47 AM
To: 'Stanley, Jon'; veritas-ha@mailman.eng.auburn.edu; Jim Senicka
Subject: RE: [Veritas-ha] Custom Agent

Thanks Jon / Jim! I know you guys don't want to hear this but I write
these agents all the time for Sun Cluster and this is my first request
to do one for VCS. I already have the start / stop / monitor scripts
already created and I just needed the info to incorporate them into the
VCS Framework. I will have to write a clean script after I determine if
there are things like, shared memory, semaphores or lock files that need
to be cleaned up. 

Jon - "Agent Developers Guide" huh :-)! Next time I will RTFM I will
also read the document Jim sent me. Thanks again!

Regards,
Fred Butler
(484) 241-5912 (Cell #1)
(484) 903-4742 (Cell #2)
http://www.arch.com/ Pin#: 8778977117

-Original Message-
From: Stanley, Jon [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 05, 2007 11:06 AM
To: Fred Butler; veritas-ha@mailman.eng.auburn.edu
Subject: RE: [Veritas-ha] Custom Agent

Have you looked at the aptly named 'Agent Developers Guide'? :-)

Or maybe the Application agent does what you need it to do instead?  If
you can provide external scripts to do the online, offline, monitor, and
clean functionality, then that's all that you need... 

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Fred 
> Butler
> Sent: Thursday, April 05, 2007 14:45
> To: veritas-ha@mailman.eng.auburn.edu
> Subject: [Veritas-ha] Custom Agent
> 
> Team - I need to write a custom agent in VCS and I need to know what 
> manual has this information. Or - if someone has some notes on this 
> process they would like to share I would be very appreciative.
> 
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
> 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Custom Agent

2007-04-05 Thread Jim Senicka

The programming by example is the best bet 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stanley,
Jon
Sent: Thursday, April 05, 2007 11:06 AM
To: Fred Butler; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Custom Agent

Have you looked at the aptly named 'Agent Developers Guide'? :-)

Or maybe the Application agent does what you need it to do instead?  If
you can provide external scripts to do the online, offline, monitor, and
clean functionality, then that's all that you need... 

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Fred 
> Butler
> Sent: Thursday, April 05, 2007 14:45
> To: veritas-ha@mailman.eng.auburn.edu
> Subject: [Veritas-ha] Custom Agent
> 
> Team - I need to write a custom agent in VCS and I need to know what 
> manual has this information. Or - if someone has some notes on this 
> process they would like to share I would be very appreciative.
> 
> ___
> Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
> 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Custom Agent

2007-04-05 Thread Jim Senicka

 
http://eval.symantec.com/mktginfo/products/White_Papers/High_Availabilit
y/agent_dev_by_example.pdf

Written by Tom Stephens and Eric Hennessey on my team. Fantastic stuff



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Fred
Butler
Sent: Thursday, April 05, 2007 10:45 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Custom Agent

Team - I need to write a custom agent in VCS and I need to know what
manual has this information. Or - if someone has some notes on this
process they would like to share I would be very appreciative.

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] SRDF agent over VIO for VCS cluster for AIX -supported ?

2007-04-04 Thread Jim Senicka

Best places to get these answers are your sales team..

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pavel A
Tsvetkov
Sent: Wednesday, April 04, 2007 4:19 AM
To: Veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] SRDF agent over VIO for VCS cluster for AIX
-supported ?

Hello all! 

This is my new question. :) 
Symantec people, I am sorry. :) 

Does SRDF agent works over VIO using  solution enabler  client
configuration with SYMAPI-server on VIOs ? 

Thanks!  Pavel. :)
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] LVMVG agent does work with VIO ! ! !

2007-04-02 Thread Jim Senicka

We have a number of issues with reservations, and breaking reservations
and such.
So as of now, if the HCL says not supported, it is not.
Please work with your account team to find out what can be done (if
anything) to get this added

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pavel A
Tsvetkov
Sent: Monday, April 02, 2007 9:43 AM
To: Veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] LVMVG agent does work with VIO ! ! !



Hello all! 

My  last post was a question about Symantec  support of LVMVG agent in
VIO configuration. It seems to me nobody could answer my question... 
So I decided to check it out myself. I installed VCS Cluster 5 MP1 for
AIX on my 570 server with two LPARs and two VIO-s. I  used only one VIO
in my configuration. 
 One disk was shared by VIO for  two LPAR-s.  The LVM group was created
and clustered. Everything was quite right! No problems with switching
over of the LVM group from one LPAR to another. 
 So I'd like very much  to get any comments from Symantec people ! 

  Regards, Pavel 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] remove a VRTS license

2007-03-26 Thread Jim Senicka

have you looked at the options to vrtslic?

  _  

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kiss László - 
Károly
Sent: Monday, March 26, 2007 8:41 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] remove a VRTS license

Hi,

Is there any possibility to remove an installed license from a vcs 4.1?

Thanks.

BR,
Laszlo

  _  

Sucker-punch spam 

  with award-winning protection.
Try the free Yahoo! Mail Beta. 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

1 2 >

1 - 100 of 144 matches

Mail list logo