Re: Storage failure in not handled well in CS

Nik Martin Wed, 03 Oct 2012 06:52:10 -0700

Bump? This is a serious issue that I need to get resolved. An entirecloud going down while one SAN is being repaired is a bad thing. Mycloud controller still refuses to start VMs because it cannot connect toa SAN that is in maintenance mode and is offline.


On 10/02/2012 03:12 PM, Nik Martin wrote:

I have two SANs connected to CS as primary storage.  One is an HD based
SAN, with a single target and LUN, and the other is an SSD SAN split
into two volumes, each connected with a target and LUN.  The HD san is
where all system VMs are stored (or they were before I added the HD SAN,
but I have no ide where the system vm volumens are stored).  This
morning, I had to do a semi emergency shutdown of the SSD SAN, so I put
both LUNS in emergency maintenance mode in CS.  CS shutdown the entire
cloud, not just the volumes stored in the SSD san.  The san is offline,
and CS shows it in maintenance mode, but NO vm's will start, and the cs
management log shows:

onnecting; event = AgentDisconnected; new status = Alert; old update
count = 959; new update count = 960]
2012-10-02 15:10:40,370 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentTaskPool-2:null) Notifying other nodes of to disconnect
2012-10-02 15:10:40,370 WARN  [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Unable to connect due to
com.cloud.exception.ConnectionException: Unable to connect to pool
Pool[204|IscsiLUN]
     at
     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

     at java.lang.Thread.run(Thread.java:679)
Caused by: com.cloud.exception.StorageUnavailableException: Resource
[StoragePool:204] is unreachable: Unable establish connection from
storage head to storage pool 204 due to ModifyStoragePoolCommand add
XenAPIException:Can not see storage pool:
cfd3b016-d4d9-3bb9-b1f9-f31374c44185 from on
host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd
host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/0
     at
com.cloud.storage.StorageManagerImpl.connectHostToSharedPool(StorageManagerImpl.java:1567)

     at
com.cloud.storage.listener.StoragePoolMonitor.processConnect(StoragePoolMonitor.java:88)

     ... 8 more
2012-10-02 15:10:40,371 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
Transition:[Resource state = Enabled, Agent event = AgentDisconnected,
Host id = 6, name = hv1]
2012-10-02 15:10:40,375 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
Agent status update: [id = 6; name = hv1; old status = Alert; event =
AgentDisconnected; new status = Alert; old update count = 960; new
update count = 961]


host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/1 is the SAN that is
in maintenance mode, so why is CS still trying to connect?  All my HVs
are in alert state becasue of this.



--
Regards,

Nik

Nik Martin
VP Business Development
Nfina Technologies, Inc.
+1.251.243.0043 x1003
Relentless Reliability

Re: Storage failure in not handled well in CS

Reply via email to