Hi Chavdar and Michael,
Thanks for your thoughts and help.
I added "memoryefficientbackup".
But still the sessions keep crashing. Once the session crashes, I get a whole
bit of errors for storage pool directories, and in fact the whole pool
becomes unavailable.
I run "update stgpooldir ... access=readwrite" and all is accessible again.
Some of the containers are in unavailable state and need audit.
Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 64
GB memory, 110 TB disk. The disks are declared as VMDKs. Network is on a 10Gb
Intel 82588 card.
Nothing I can see points to a lack of resources.
Everything worked fine till 4 days ago. That is why I thought of a problem with
Windows updates, but as I rolled them back, that does not make sense.
I am quite at a loss where to look next ...
Thanks
David
[Server Side] .
20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2 (WinNT)
(SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol
TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
certificate TSM Self-Signed Certificate. (SESSION:
197903)
20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2 (WinNT)
(SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
20-08-2023 19:47:55 ANR2012W Error encountered for storage pool directory:
\\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
CPOOL. (SESSION: 197881)
20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction
0:83236375 was aborted. (SESSION: 197881)
20-08-2023 19:47:55 ANR0204I The container state for
\\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
ncf is updated from AVAILABLE to UNAVAILABLE.
(SESSION:
197883)
20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening or
writing to the container. Container
\\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
ncf in stgpool CPOOL has been marked as UNAVAILABLE
and
should be audited to validate accessibility and
content.
(SESSION: 197883)
[From the client side:]
During the incr of a large filespace:
Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU -
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and
general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful **
ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU -
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and
general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
ANS1311E Server out of data storage space
[I ran sel of the latest file. It failed because all containerdirs were
unavailable.]
ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23
BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and
general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures.
Total number of objects inspected: 1
Total number of objects backed up: 0
Total number of objects updated: 0
Total number of objects rebound: 0
Total number of objects deleted: 0
Total number of objects expired: 0
Total number of objects failed: 1
...
Network data transfer rate: 148.306,35 KB/sec
Aggregate data transfer rate: 211,50 KB/sec
Objects compressed by: 0%
Total data reduction ratio: 0.23%
Subfile objects reduced by: 0%
Elapsed processing time: 00:00:32
ANS1311E Server out of data storage space
[Then I updated the containerdirs to readwrite and ran the selective backup. No
problem]
-----------------------------------------------------------------------------------------------------------
Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU -
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and
general MRI data\For-Crop-T2W - coronal Copy.pptx'
Selective Backup function invoked.
Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU -
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and
general MRI data\For-Crop-T2W - coronal Copy.pptx [Sent]
Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 BU -
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and
general MRI data\For-Crop-T2W - coronal Copy.pptx' finished without failure.
-----Original Message-----
From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Chavdar Cholev
Sent: Sunday, August 20, 2023 3:43 PM
To: [email protected]
Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client
Just to make sure that we are on the same page...
You have TSM installed on VM running on VMware. This VM has few LUNs presented
and those LUN are used for containers?
Short in the dark:
1. Check VM resources if they are as IBM TSM blue print.
2. Check LUNs/HDDs response time in perf. monitor. The response time should
around 20-30 Ms during the backup operating.
3. Do you know if those HDDd for LUNs are .vmdk or RDM (raw device map)?
Thank you!
Chavdar
On Saturday, August 19, 2023, David L.A. De Leeuw <[email protected]> wrote:
> Hi TSM experts,
>
> Our incr backup fails consistently in the last few days. It starts
> alright but after a few gigabyte on the client we get the error:
>
> ANS1301E This operation cannot continue due to an error on the IBM
> Spectrum Protect server. See your IBM Spectrum Protect server
> administrator for assistance.
>
> On the server side we see:
>
> 18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
> 18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for
> node
> MEDFS2 (WinNT) - internal server error detected.
> (SESSION: 194578)
> 18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
>
>
> Then we find one or more containers unavailable. We fix the containers
> with "audit container ... action=scanall"
> No errors are found. But the next backup will fail again.
>
> The server is on 8.1.17, the client as well.
> The containers are on a number of disks on a shared windows server 2019.
> There have been some updates on the windows server recently.
> (KB5029247,KB5029647)
>
> The audits are fine, data is accessible, but backups fail.
> Any ideas ?
>
> David de Leeuw
> Ben-Gurion University of the Negev
> Beer Sheva Israel
>
>