Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread David L.A. De Leeuw
Hi all,

Apparently, this has nothing to do with SP at all !

The (Windows server 2019 on ESXI) system holding the containers just 
disconnects for 5 minutes ! 

No pings to the server. 

When access is restored, later on, a message appears in the events:
"The system time has changed to 2023-08-20T19:05:05 from 2023-08-20T19:01:04  "
This is no warning even, just "information". 

I have no idea why this should happen, but we will find it.
Thanks for your support !

David



-Original Message-
From: דוד דה ליאו 
Sent: Sunday, August 20, 2023 9:37 PM
To: ADSM: Dist Stor Manager 
Subject: RE: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Hi Michael,

Thanks a lot.

The SP Server is not on VM, just the storage. I am not the manager to the 
server. 
Just got a lot of backup storage if we provide the space for the containers.

Sure we run a lot of sessions in parallel as you said. I will try a run 
according to your recommendations.
One other thought I am testing, is that over a year ago we also had crashes. 
The 10 Gb optical network had hickups. Our 1 Gb line worked fine.
I just switched back to the 1 Gb and see what happens. 

Will keep you posted !

David


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Michael Prix
Sent: Sunday, August 20, 2023 9:04 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Hello David,

  an *SP-Server in a VM is not the best setup, but nevertheless it should work 
- and has proven so for the past.

For the client: Please show the dsm.opt. I suspect you are trunning several 
sessions from this client in parallel during a backup-> stop it for the moment.
Start with a basic dsm.opt, disable the option "resourceutilization", if set,  
and set "memoryefficient yes" (or "diskcachem" if you like). I'f it still 
crashes with a plain dsm.opt, you should open a ticket with IBM.

-- 
Michael Prix




August 20, 2023 at 7:25 PM, "David L.A. De Leeuw"  wrote:


> 
> Hi Chavdar and Michael,
> 
> Thanks for your thoughts and help.
> 
> I added "memoryefficientbackup". 
> 
> But still the sessions keep crashing. Once the session crashes, I get a whole 
> bit of errors for storage pool directories, and in fact the whole pool 
> becomes unavailable. 
> I run "update stgpooldir ... access=readwrite" and all is accessible again.
> Some of the containers are in unavailable state and need audit. 
> 
> Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 
> 64 GB memory, 110 TB disk. The disks are declared as VMDKs. Network is on a 
> 10Gb Intel 82588 card.
> Nothing I can see points to a lack of resources.
> 
> Everything worked fine till 4 days ago. That is why I thought of a problem 
> with Windows updates, but as I rolled them back, that does not make sense.
> 
> I am quite at a loss where to look next ...
> 
> Thanks
> 
> David
> 
> [Server Side] .
> 20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
> 20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol
>  TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
>  certificate TSM Self-Signed Certificate. (SESSION:
>  197903)
> 20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
> 20-08-2023 19:47:55 ANR2012W Error encountered for storage pool directory:
>  \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
>  CPOOL. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction
>  0:83236375 was aborted. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR0204I The container state for
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
>  ncf is updated from AVAILABLE to UNAVAILABLE. (SESSION:
>  197883)
> 20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening or
>  writing to the container. Container
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
>  ncf in stgpool CPOOL has been marked as UNAVAILABLE and
>  should be audited to validate accessibility and content.
>  (SESSION: 197883)
> 
> [From the client side:]
> 
> During the incr of a large filespace:
> 
> Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful **
> ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
> ANS1311E Server out of data storage space
> 
> [I ran

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread David L.A. De Leeuw
Hi Michael,

Thanks a lot.

The SP Server is not on VM, just the storage. I am not the manager to the 
server. 
Just got a lot of backup storage if we provide the space for the containers.

Sure we run a lot of sessions in parallel as you said. I will try a run 
according to your recommendations.
One other thought I am testing, is that over a year ago we also had crashes. 
The 10 Gb optical network had hickups. Our 1 Gb line worked fine.
I just switched back to the 1 Gb and see what happens. 

Will keep you posted !

David


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Michael Prix
Sent: Sunday, August 20, 2023 9:04 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Hello David,

  an *SP-Server in a VM is not the best setup, but nevertheless it should work 
- and has proven so for the past.

For the client: Please show the dsm.opt. I suspect you are trunning several 
sessions from this client in parallel during a backup-> stop it for the moment.
Start with a basic dsm.opt, disable the option "resourceutilization", if set,  
and set "memoryefficient yes" (or "diskcachem" if you like). I'f it still 
crashes with a plain dsm.opt, you should open a ticket with IBM.

-- 
Michael Prix




August 20, 2023 at 7:25 PM, "David L.A. De Leeuw"  wrote:


> 
> Hi Chavdar and Michael,
> 
> Thanks for your thoughts and help.
> 
> I added "memoryefficientbackup". 
> 
> But still the sessions keep crashing. Once the session crashes, I get a whole 
> bit of errors for storage pool directories, and in fact the whole pool 
> becomes unavailable. 
> I run "update stgpooldir ... access=readwrite" and all is accessible again.
> Some of the containers are in unavailable state and need audit. 
> 
> Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 
> 64 GB memory, 110 TB disk. The disks are declared as VMDKs. Network is on a 
> 10Gb Intel 82588 card.
> Nothing I can see points to a lack of resources.
> 
> Everything worked fine till 4 days ago. That is why I thought of a problem 
> with Windows updates, but as I rolled them back, that does not make sense.
> 
> I am quite at a loss where to look next ...
> 
> Thanks
> 
> David
> 
> [Server Side] .
> 20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
> 20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol
>  TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
>  certificate TSM Self-Signed Certificate. (SESSION:
>  197903)
> 20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
> 20-08-2023 19:47:55 ANR2012W Error encountered for storage pool directory:
>  \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
>  CPOOL. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction
>  0:83236375 was aborted. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR0204I The container state for
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
>  ncf is updated from AVAILABLE to UNAVAILABLE. (SESSION:
>  197883)
> 20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening or
>  writing to the container. Container
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
>  ncf in stgpool CPOOL has been marked as UNAVAILABLE and
>  should be audited to validate accessibility and content.
>  (SESSION: 197883)
> 
> [From the client side:]
> 
> During the incr of a large filespace:
> 
> Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful **
> ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
> ANS1311E Server out of data storage space
> 
> [I ran sel of the latest file. It failed because all containerdirs were 
> unavailable.]
> 
> ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 
> BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures.
> 
> Total number of objects inspected: 1
> Total number of objects backed up: 0
> Total number of objects updated: 0
> Total number of objects rebound: 0
> Total number of objects deleted: 0
> Total number of objects expired: 0
> Total number of objects failed: 1
>  ...

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread Michael Prix
Hello David,

  an *SP-Server in a VM is not the best setup, but nevertheless it should work 
- and has proven so for the past.

For the client: Please show the dsm.opt. I suspect you are trunning several 
sessions from this client in parallel during a backup-> stop it for the moment.
Start with a basic dsm.opt, disable the option "resourceutilization", if set,  
and set "memoryefficient yes" (or "diskcachem" if you like). I'f it still 
crashes with a plain dsm.opt, you should open a ticket with IBM.

-- 
Michael Prix




August 20, 2023 at 7:25 PM, "David L.A. De Leeuw"  wrote:


> 
> Hi Chavdar and Michael,
> 
> Thanks for your thoughts and help.
> 
> I added "memoryefficientbackup". 
> 
> But still the sessions keep crashing. Once the session crashes, I get a whole 
> bit of errors for storage pool directories, and in fact the whole pool 
> becomes unavailable. 
> I run "update stgpooldir ... access=readwrite" and all is accessible again.
> Some of the containers are in unavailable state and need audit. 
> 
> Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 
> 64 GB memory, 110 TB disk. The disks are declared as VMDKs. Network is on a 
> 10Gb Intel 82588 card.
> Nothing I can see points to a lack of resources.
> 
> Everything worked fine till 4 days ago. That is why I thought of a problem 
> with Windows updates, but as I rolled them back, that does not make sense.
> 
> I am quite at a loss where to look next ...
> 
> Thanks
> 
> David
> 
> [Server Side] .
> 20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
> 20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol
>  TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
>  certificate TSM Self-Signed Certificate. (SESSION:
>  197903)
> 20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
> 20-08-2023 19:47:55 ANR2012W Error encountered for storage pool directory:
>  \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
>  CPOOL. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction
>  0:83236375 was aborted. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR0204I The container state for
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
>  ncf is updated from AVAILABLE to UNAVAILABLE. (SESSION:
>  197883)
> 20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening or
>  writing to the container. Container
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
>  ncf in stgpool CPOOL has been marked as UNAVAILABLE and
>  should be audited to validate accessibility and content.
>  (SESSION: 197883)
> 
> [From the client side:]
> 
> During the incr of a large filespace:
> 
> Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful **
> ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
> ANS1311E Server out of data storage space
> 
> [I ran sel of the latest file. It failed because all containerdirs were 
> unavailable.]
> 
> ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 
> BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures.
> 
> Total number of objects inspected: 1
> Total number of objects backed up: 0
> Total number of objects updated: 0
> Total number of objects rebound: 0
> Total number of objects deleted: 0
> Total number of objects expired: 0
> Total number of objects failed: 1
>  ...
> Network data transfer rate: 148.306,35 KB/sec
> Aggregate data transfer rate: 211,50 KB/sec
> Objects compressed by: 0%
> Total data reduction ratio: 0.23%
> Subfile objects reduced by: 0%
> Elapsed processing time: 00:00:32
> ANS1311E Server out of data storage space
> 
> [Then I updated the containerdirs to readwrite and ran the selective backup. 
> No problem]
> ---
> Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx'
> Selective Backu

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread David L.A. De Leeuw
Hi Chavdar and Michael,

Thanks for your thoughts and help.

I added "memoryefficientbackup". 

But still the sessions keep crashing. Once the session crashes, I get a whole 
bit of errors   for storage pool directories, and in fact the whole pool 
becomes unavailable. 
I run "update stgpooldir ... access=readwrite" and all is accessible again.
Some of the containers are in unavailable state and need audit. 

Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 64 
GB memory, 110 TB disk.  The disks are declared as VMDKs.  Network is on a 10Gb 
Intel 82588 card.
Nothing I can see points to a lack of resources.

Everything worked fine till 4 days ago. That is why I thought of a problem with 
Windows updates, but as I rolled them back, that does not make sense.

I am quite at a loss where to look next ...

Thanks

David

[Server Side] .
20-08-2023 19:47:22  ANR0839I Session 197902 started for node MEDFS2 (WinNT)
  (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
20-08-2023 19:47:26  ANR8592I Session 197903 connection is using protocol
  TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
  certificate TSM Self-Signed Certificate. (SESSION:
  197903)
20-08-2023 19:47:26  ANR0839I Session 197903 started for node MEDFS2 (WinNT)
  (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
20-08-2023 19:47:55  ANR2012W Error encountered for storage pool directory:
  \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
  CPOOL. (SESSION: 197881)
20-08-2023 19:47:55  ANR1181E sdtxn.c(1404): Data storage transaction
  0:83236375 was aborted. (SESSION: 197881)
20-08-2023 19:47:55  ANR0204I The container state for
  
\\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
  ncf is updated from AVAILABLE to UNAVAILABLE. 
(SESSION:
  197883)
20-08-2023 19:47:55  ANR3660E An unexpected error occurred while opening or
  writing to the container. Container
  
\\medbackup.med.ad.bgu.ac.il\tsmc17\18\1853.-
  ncf in stgpool CPOOL has been marked as UNAVAILABLE 
and
  should be audited to validate accessibility and 
content.
   (SESSION: 197883)

[From the client side:]

During the incr of a large filespace:

Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx  ** Unsuccessful **
ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
ANS1311E Server out of data storage space

[I ran sel of the latest file. It failed because all containerdirs were 
unavailable.]

ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 
BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures.


Total number of objects inspected:1
Total number of objects backed up:0
Total number of objects updated:  0
Total number of objects rebound:  0
Total number of objects deleted:  0
Total number of objects expired:  0
Total number of objects failed:   1
 ...
Network data transfer rate:  148.306,35 KB/sec
Aggregate data transfer rate:211,50 KB/sec
Objects compressed by:0%
Total data reduction ratio:0.23%
Subfile objects reduced by:   0%
Elapsed processing time:   00:00:32
ANS1311E Server out of data storage space

[Then I updated the containerdirs to readwrite and ran the selective backup. No 
problem]
---
Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx'
Selective Backup function invoked.

Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread Chavdar Cholev
Just to make sure that we are on the same page...
You have TSM installed on VM running on VMware. This VM has few LUNs
presented and those LUN are used for containers?

Short in the dark:
1. Check VM resources if they are as IBM TSM blue print.
2. Check LUNs/HDDs response time in perf. monitor. The response time should
around 20-30 Ms during the backup operating.
3. Do you know if those HDDd for LUNs are .vmdk or RDM (raw device map)?

Thank you!
Chavdar

On Saturday, August 19, 2023, David L.A. De Leeuw  wrote:

> Hi TSM experts,
>
> Our incr backup fails consistently in the last few days. It starts alright
> but after a few gigabyte on the client we get the error:
>
> ANS1301E This operation cannot continue due to an error on the IBM
> Spectrum Protect server. See your IBM Spectrum Protect server administrator
> for assistance.
>
> On the server side we see:
>
> 18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
> 18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for node
> MEDFS2 (WinNT) - internal server error detected.
> (SESSION: 194578)
> 18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
>
>
> Then we find one or more containers unavailable. We fix the containers
> with "audit container ... action=scanall"
> No errors are found. But the next backup will fail again.
>
> The server is on 8.1.17, the client as well.
> The containers are on a number of disks on a shared windows server 2019.
> There have been some updates on the windows server recently.
> (KB5029247,KB5029647)
>
> The audits are fine, data is accessible, but backups fail.
> Any ideas ?
>
> David de Leeuw
> Ben-Gurion University of the Negev
> Beer Sheva Israel
>
>


Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread Michael Prix
Hello David,

  if partial incremental backups work, but full incrementals fail, this points 
to a problem with the client.
Have you tried the clientoption "memoryefficientbackup" already for this 
client? It might be that the number of files grew in the past and just hit the 
memory boundaries of the client.

Always remember: IBM error messages aren't there to help YOU :-)

-- 
Michael Prix




August 20, 2023 at 10:46 AM, "David L.A. De Leeuw"  wrote:


> 
> Hi Chavdar,
> 
> For the containers we use a dedicated Windows Server, (on VMWare EsxI), with 
> just a bunch of disks on NTFS. Each disk about 4 TB.
> INCR of one specific client crashes since a few days every time. There are 
> about 25 million files there (40 TB).
> If I try incremental backup on a part of the disk, it works fine.
> 
> On the server with the containers, I checked events, antivirus, uninstalled 
> Windows updates of August. Did not find any problems.
> The only similar case I found was here:
> https://adsm.org/forum/index.php?threads/storagepool-containers-going-unavailable.32858/
> 
> That points to a problem of memory on the Spectrum Protect server.
> Our system manager is on holidays this week, so cannot check that further.
> 
> Thanks
> 
> David
> 
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Chavdar 
> Cholev
> Sent: Sunday, August 20, 2023 11:16 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client
> 
> Hi David,
> What kind of storage do you use for containers (SAN, NAS...)?
> 
> On Saturday, August 19, 2023, David L.A. De Leeuw  wrote:
> 
> > 
> > Hi TSM experts,
> > 
> >  Our incr backup fails consistently in the last few days. It starts 
> >  alright but after a few gigabyte on the client we get the error:
> > 
> >  ANS1301E This operation cannot continue due to an error on the IBM 
> >  Spectrum Protect server. See your IBM Spectrum Protect server 
> >  administrator for assistance.
> > 
> >  On the server side we see:
> > 
> >  18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> >  \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> >  CPOOL. (SESSION: 194578)
> >  18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for 
> >  node
> >  MEDFS2 (WinNT) - internal server error detected.
> >  (SESSION: 194578)
> >  18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> >  \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> >  CPOOL. (SESSION: 194578)
> > 
> >  Then we find one or more containers unavailable. We fix the containers 
> >  with "audit container ... action=scanall"
> >  No errors are found. But the next backup will fail again.
> > 
> >  The server is on 8.1.17, the client as well.
> >  The containers are on a number of disks on a shared windows server 2019.
> >  There have been some updates on the windows server recently.
> >  (KB5029247,KB5029647)
> > 
> >  The audits are fine, data is accessible, but backups fail.
> >  Any ideas ?
> > 
> >  David de Leeuw
> >  Ben-Gurion University of the Negev
> >  Beer Sheva Israel
> >
>


Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread David L.A. De Leeuw
Hi Chavdar,

For the containers we use a dedicated Windows Server, (on VMWare EsxI), with 
just a bunch of disks on NTFS. Each disk about 4 TB.
INCR of one specific client crashes since a few days every time. There are 
about 25 million files there (40 TB).
If I try incremental backup on a part of the disk, it works fine.

On the server with the containers, I checked events, antivirus, uninstalled 
Windows updates of August. Did not find any problems.
The only similar case I found was here:
https://adsm.org/forum/index.php?threads/storagepool-containers-going-unavailable.32858/

That points to a problem of memory on the Spectrum Protect server.
Our system manager is on holidays this week, so cannot check that further.

Thanks

David


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Chavdar Cholev
Sent: Sunday, August 20, 2023 11:16 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Hi David,
What kind of storage do you use for containers (SAN, NAS...)?

On Saturday, August 19, 2023, David L.A. De Leeuw  wrote:

> Hi TSM experts,
>
> Our incr backup fails consistently in the last few days. It starts 
> alright but after a few gigabyte on the client we get the error:
>
> ANS1301E This operation cannot continue due to an error on the IBM 
> Spectrum Protect server. See your IBM Spectrum Protect server 
> administrator for assistance.
>
> On the server side we see:
>
> 18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
> 18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for 
> node
> MEDFS2 (WinNT) - internal server error detected.
> (SESSION: 194578)
> 18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
>
>
> Then we find one or more containers unavailable. We fix the containers 
> with "audit container ... action=scanall"
> No errors are found. But the next backup will fail again.
>
> The server is on 8.1.17, the client as well.
> The containers are on a number of disks on a shared windows server 2019.
> There have been some updates on the windows server recently.
> (KB5029247,KB5029647)
>
> The audits are fine, data is accessible, but backups fail.
> Any ideas ?
>
> David de Leeuw
> Ben-Gurion University of the Negev
> Beer Sheva Israel
>
>


Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

2023-08-20 Thread Chavdar Cholev
Hi David,
What kind of storage do you use for containers (SAN, NAS...)?

On Saturday, August 19, 2023, David L.A. De Leeuw  wrote:

> Hi TSM experts,
>
> Our incr backup fails consistently in the last few days. It starts alright
> but after a few gigabyte on the client we get the error:
>
> ANS1301E This operation cannot continue due to an error on the IBM
> Spectrum Protect server. See your IBM Spectrum Protect server administrator
> for assistance.
>
> On the server side we see:
>
> 18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
> 18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for node
> MEDFS2 (WinNT) - internal server error detected.
> (SESSION: 194578)
> 18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
>
>
> Then we find one or more containers unavailable. We fix the containers
> with "audit container ... action=scanall"
> No errors are found. But the next backup will fail again.
>
> The server is on 8.1.17, the client as well.
> The containers are on a number of disks on a shared windows server 2019.
> There have been some updates on the windows server recently.
> (KB5029247,KB5029647)
>
> The audits are fine, data is accessible, but backups fail.
> Any ideas ?
>
> David de Leeuw
> Ben-Gurion University of the Negev
> Beer Sheva Israel
>
>