Re: [Bacula-users] [Bacula-devel] Load slots timeout
On Nov 10, 2014, at 3:33 AM, Andrey Chebotarev a...@525.su wrote: Hi. It's in devel because the question about editing sources, but not about configuration. And I think solve the question could only developer. The conclusion to change the code is premature. On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev a...@525.su wrote: Hi guys. I use bacula 5.2.13 with IBM TS3200 library. Periodically I face problem with load slot time out. In logs it looks lilke: 22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume 220AAAL6 22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger load slot 1, drive 1 command. 22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger load slot 1, drive 1: ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) 22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD. As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job. I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically) So how can I solve the problem? I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources? Hi everybody. Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes? I don’t see why this is on devel It should be on users, which is cc’d here. -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
Mr. Andrey, /etc/bacula/scripts/mtx-changer: wait_for_drive() { i=0 while [ $i -le 300 ]; do # Wait max 300 seconds if mt -f $1 status 21 | grep ${ready} /dev/null 21; then break fi debug Device $1 - not ready, retrying... sleep 1 i=`expr $i + 1` done } Regards, = Heitor Medrado de Faria Faltam poucos dias - Treinamento Telepresencial Bacula: http://www.bacula.com.br/?p=2174 61 2021-8260 | 8268-4220 Site: www.bacula.com.br | Facebook: heitor.faria | Gtalk: heitorfa...@gmail.com == - Mensagem original - De: Andrey Chebotarev a...@525.su Para: Dan Langille d...@langille.org Cc: bacula-de...@lists.sourceforge.net, bacula-users bacula-users@lists.sourceforge.net Enviadas: Segunda-feira, 10 de novembro de 2014 10:24:48 Assunto: Re: [Bacula-devel] Load slots timeout I wanted to change the code by myself. I just wanted to know where is in code place where defined 5 minutes timeout. On Nov 10, 2014, at 3:33 AM, Andrey Chebotarev a...@525.su wrote: blockquote Hi. It's in devel because the question about editing sources, but not about configuration. And I think solve the question could only developer. The conclusion to change the code is premature. blockquote blockquote blockquote On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev a...@525.su wrote: blockquote Hi guys. I use bacula 5.2.13 with IBM TS3200 library. Periodically I face problem with load slot time out. In logs it looks lilke: 22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume 220AAAL6 22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger load slot 1, drive 1 command. 22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger load slot 1, drive 1: ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) 22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD. As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job. I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically) So how can I solve the problem? I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources? /blockquote /blockquote blockquote Hi everybody. Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes? /blockquote I don’t see why this is on devel It should be on users, which is cc’d here. /blockquote /blockquote /blockquote -- ___ Bacula-devel mailing list bacula-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
Consider a admin job where the drive is cleaned. I'm very confused by the library needing to clean so often. -- Dan Langille http://langille.org/ On Nov 10, 2014, at 7:45 AM, Andrey Chebotarev a...@525.su wrote: Ok, may be there is another solution. Let's put everything in one place: I have very long jobs, which spend 30-40 cartridges per one. I have library which have to be configured with auto cleaning. During running job after several cartridges filled, say 10, library wants to clean tape and does it automatically. Cleaning takes more than 5 minutes. Meanwhile bacula is trying to load new cartridges and after 5 minutes fails. What is the best solution of my issue? I wanted to change the code by myself. I just wanted to know where is in code place where defined 5 minutes timeout. On Nov 10, 2014, at 3:33 AM, Andrey Chebotarev a...@525.su wrote: Hi. It's in devel because the question about editing sources, but not about configuration. And I think solve the question could only developer. The conclusion to change the code is premature. On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev a...@525.su wrote: Hi guys. I use bacula 5.2.13 with IBM TS3200 library. Periodically I face problem with load slot time out. In logs it looks lilke: 22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume 220AAAL6 22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger load slot 1, drive 1 command. 22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger load slot 1, drive 1: ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) 22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD. As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job. I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically) So how can I solve the problem? I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources? Hi everybody. Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes? I don’t see why this is on devel It should be on users, which is cc’d here. -- ___ Bacula-devel mailing list bacula-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
On Nov 10, 2014, at 7:45 AM, Andrey Chebotarev a...@525.su wrote: Ok, may be there is another solution. Let's put everything in one place: I have very long jobs, which spend 30-40 cartridges per one. I have library which have to be configured with auto cleaning. During running job after several cartridges filled, say 10, library wants to clean tape and does it automatically. Cleaning takes more than 5 minutes. Meanwhile bacula is trying to load new cartridges and after 5 minutes fails. What is the best solution of my issue? I’m sorry this took so long to get you any help. The above information, and the logs you provided in another post, should have been included with your initial post. Context is everything. Given the evidence presented, everyone correctly concluded you were having trouble with initial configuration. That was not the case. I've modified and recompiled, waiting for result till next drive cleaning... Probably that's it: stored/stored_conf.c: {maximumchangerwait,store_time, ITEM(res_dev.max_changer_wait), 0, ITEM_DEFAULT, 5 * 60}, If this helps, I think it should/could become a configurable item, available in bacula-dir.conf. I also wonder what other installations do for cleaning in similar circumstances. — Dan Langille signature.asc Description: Message signed with OpenPGP using GPGMail -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
There is no need to modify the code. Just modify the bacula-sd.conf file. It is much easier. Best regards, Kern On 11/10/2014 04:14 PM, Andrey Chebotarev wrote: I've modified and recompiled, waiting for result till next drive cleaning... Probably that's it: stored/stored_conf.c: {"maximumchangerwait", store_time, ITEM(res_dev.max_changer_wait), 0, ITEM_DEFAULT, 5 * 60}, Here example what's going on. Job logs: 31-Oct 12:35 baculasrv-dir JobId 10967: Start Backup JobId 10967, Job=StorwizeBSRVData2BJob1.2014-10-31_12.35.06_34 31-Oct 12:35 baculasrv-dir JobId 10967: Recycled volume "346AAJL6" 31-Oct 12:35 baculasrv-dir JobId 10967: Using Device "IBM_TS3200_2_drive1" to write. 31-Oct 12:35 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 46, drive 1" command. 31-Oct 12:36 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 12, drive 1" command. 31-Oct 12:36 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 12, drive 1", status is OK. 31-Oct 12:36 baculasrv-sd JobId 10967: Recycled volume "346AAJL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost. 31-Oct 12:36 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "346AAJL6" as Used. 31-Oct 23:25 baculasrv-sd JobId 10967: End of Volume "346AAJL6" at 2555:8545 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1. 31-Oct 23:25 baculasrv-sd JobId 10967: Re-read of last block succeeded. 31-Oct 23:25 baculasrv-sd JobId 10967: End of medium on Volume "346AAJL6" Bytes=2,555,387,735,040 Blocks=39,611,044 at 31-Oct-2014 23:25. 31-Oct 23:25 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 12, drive 1" command. 31-Oct 23:28 baculasrv-dir JobId 10967: Recycled volume "349AAJL6" 31-Oct 23:28 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 34, drive 1" command. 31-Oct 23:28 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 34, drive 1", status is OK. 31-Oct 23:28 baculasrv-sd JobId 10967: Recycled volume "349AAJL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost. 31-Oct 23:28 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "349AAJL6" as Used. 31-Oct 23:28 baculasrv-sd JobId 10967: New volume "349AAJL6" mounted on device "IBM_TS3200_2_drive1" (/dev/nst0) at 31-Oct-2014 23:28. 01-Nov 10:25 baculasrv-sd JobId 10967: End of Volume "349AAJL6" at 2541:9287 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1. 01-Nov 10:25 baculasrv-sd JobId 10967: Re-read of last block succeeded. 01-Nov 10:25 baculasrv-sd JobId 10967: End of medium on Volume "349AAJL6" Bytes=2,541,436,498,944 Blocks=39,394,786 at 01-Nov-2014 10:25. 01-Nov 10:25 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 34, drive 1" command. 01-Nov 10:27 baculasrv-dir JobId 10967: Recycled volume "225AAAL6" 01-Nov 10:27 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 3, drive 1" command. 01-Nov 10:28 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 3, drive 1", status is OK. 01-Nov 10:28 baculasrv-sd JobId 10967: Recycled volume "225AAAL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost. 01-Nov 10:28 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "225AAAL6" as Used. 01-Nov 10:28 baculasrv-sd JobId 10967: New volume "225AAAL6" mounted on device "IBM_TS3200_2_drive1" (/dev/nst0) at 01-Nov-2014 10:28. 01-Nov 23:20 baculasrv-sd JobId 10967: End of Volume "225AAAL6" at 2652:7176 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1. 01-Nov 23:20 baculasrv-sd JobId 10967: Re-read of last block succeeded. 01-Nov 23:20 baculasrv-sd JobId 10967: End of medium on Volume "225AAAL6" Bytes=2,652,293,210,112 Blocks=41,113,175 at 01-Nov-2014 23:20. 01-Nov 23:20 baculasrv-sd JobId 10967: 3307 Issuing autochanger "unload slot 3, drive 1" command. 01-Nov 23:22 baculasrv-dir JobId 10967: Recycled volume "345AAJL6" 01-Nov 23:22 baculasrv-sd JobId 10967: 3304 Issuing autochanger "load slot 11, drive 1" command. 01-Nov 23:23 baculasrv-sd JobId 10967: 3305 Autochanger "load slot 11, drive 1", status is OK. 01-Nov 23:23 baculasrv-sd JobId 10967: Recycled volume "345AAJL6" on device "IBM_TS3200_2_drive1" (/dev/nst0), all previous data lost. 01-Nov 23:23 baculasrv-dir JobId 10967: Max Volume jobs=1 exceeded. Marking Volume "345AAJL6" as Used. 01-Nov 23:23 baculasrv-sd JobId 10967: New volume "345AAJL6" mounted on device "IBM_TS3200_2_drive1" (/dev/nst0) at 01-Nov-2014 23:23. 02-Nov 10:21 baculasrv-sd JobId 10967: End of Volume "345AAJL6" at 2554:12632 on device "IBM_TS3200_2_drive1" (/dev/nst0). Write of 64512 bytes got -1. 02-Nov 10:22 baculasrv-sd JobId 10967: Re-read of last block succeeded. 02-Nov
Re: [Bacula-users] [Bacula-devel] Load slots timeout
On Nov 10, 2014, at 11:22 AM, Kern Sibbald k...@sibbald.com wrote: There is no need to modify the code. Just modify the bacula-sd.conf file. It is much easier. Best regards, Kern I think Kern refers to here: http://www.bacula.org/5.0.x-manuals/en/main/main/Storage_Daemon_Configuratio.html http://www.bacula.org/5.0.x-manuals/en/main/main/Storage_Daemon_Configuratio.html Maximum Changer Wait = time This directive specifies the maximum time in seconds for Bacula to wait for an autochanger to change the volume. If this time is exceeded, Bacula will invalidate the Volume slot number stored in the catalog and try again. If no additional changer volumes exist, Bacula will ask the operator to intervene. The default is 5 minutes. There are several ‘wait’ / ‘time’ options listed there. — Dan Langille signature.asc Description: Message signed with OpenPGP using GPGMail -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
On Nov 8, 2014, at 7:28 PM, Dan Langille d...@langille.org mailto:d...@langille.org wrote: On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev a...@525.su mailto:a...@525.su wrote: Hi guys. I use bacula 5.2.13 with IBM TS3200 library. Periodically I face problem with load slot time out. In logs it looks lilke: 22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume 220AAAL6 22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger load slot 1, drive 1 command. 22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger load slot 1, drive 1: ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) 22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD. As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job. I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically) So how can I solve the problem? I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources? Hi everybody. Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes? I don’t see why this is on devel It should be on users, which is cc’d here. Have you ensured everything is correct with the permissions etc? Five minutes to load a tape is way too long for something which succeeds. My hypothesis: there is a problem with the process. Configuring an autochanger has many places where it can go astray. I documented some of the pitfalls here: http://www.freebsddiary.org/tape-library.php http://www.freebsddiary.org/tape-library.php And more of the odd stuff here: http://www.freebsddiary.org/tape-library-integration.php http://www.freebsddiary.org/tape-library-integration.php Hope that helps. — Dan Langille -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
Hello, Assuming you are using the Bacula mtx-changer script. In case of really big problems with the mtx-changer script, the first step is to execute it by hand and see what it outputs. The second step is to set the environment variable debug_log=1 and the working directory at the file mtx.log. That should give a better idea what is going on. Best regards, Kern On 11/09/2014 05:14 PM, Dan Langille wrote: On Nov 8, 2014, at 7:28 PM, Dan Langille d...@langille.org wrote: On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev a...@525.su wrote: Hi guys. I use bacula 5.2.13 with IBM TS3200 library. Periodically I face problem with load slot time out. In logs it looks lilke: 22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume "220AAAL6" 22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger "load slot 1, drive 1" command. 22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger "load slot 1, drive 1": ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) 22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD. As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job. I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically) So how can I solve the problem? I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources? Hi everybody. Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes? I don’t see why this is on devel It should be on users, which is cc’d here. Have you ensured everything is correct with the permissions etc? Five minutes to load a tape is way too long for something which succeeds. My hypothesis: there is a problem with the process. Configuring an autochanger has many places where it can go astray. I documented some of the pitfalls here: http://www.freebsddiary.org/tape-library.php And more of the odd stuff here: http://www.freebsddiary.org/tape-library-integration.php Hope that helps. — Dan Langille -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
On 09/11/2014, at 17.14, Dan Langille d...@langille.org wrote: Five minutes to load a tape is way too long for something which succeeds. My hypothesis: there is a problem with the process. Yes and no If the tape has been closed properly then Yes, if it hasnt the first thing the drive is going to do is to pass over the entire tape to get the end-marker position out of the tape which can take way more than 5 minutes and really isnt an error state. And to my knowledge there is no way to detect it other than to wait Jesper -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
On 11/10/2014 06:54 AM, Jesper Krogh wrote: On 09/11/2014, at 17.14, Dan Langille d...@langille.org wrote: Five minutes to load a tape is way too long for something which succeeds. My hypothesis: there is a problem with the process. Yes and no If the tape has been closed properly then Yes, if it hasnt the first thing the drive is going to do is to pass over the entire tape to get the end-marker position out of the tape which can take way more than 5 minutes and really isnt an error state. When Bacula is using mtx (the mtx-changer script) to load a tape, it does not move the tape to the end of media marker. This will happen later only if Bacula is going to write on the tape. Thus moving to the end of the media is not part of the timeout for loading a tape -- unless you have some really non-standard tape drive or OS kernel driver, which I have never heard of. Best regards, Kern And to my knowledge there is no way to detect it other than to wait Jesper -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Load slots timeout
On Nov 4, 2014, at 6:55 AM, Andrey Chebotarev a...@525.su mailto:a...@525.su wrote: Hi guys. I use bacula 5.2.13 with IBM TS3200 library. Periodically I face problem with load slot time out. In logs it looks lilke: 22-Sep 00:35 baculasrv-dir JobId 10471: Recycled volume 220AAAL6 22-Sep 00:35 baculasrv-sd JobId 10471: 3304 Issuing autochanger load slot 1, drive 1 command. 22-Sep 00:40 baculasrv-sd JobId 10471: Fatal error: 3992 Bad autochanger load slot 1, drive 1: ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) 22-Sep 00:32 sqcompose-fd JobId 10471: Fatal error: backup.c:1019 Network send error to SD. As I understood, bacula tries to load slot for 5 minutes and if it's not successful stops job. I started investigation why bacula sometimes doesn't manage to load slot in 5 minutes and find out library stops responding to commands and starts cleaning drive procedure which takes more than 5 minutes(Library was configured to clean drive automatically) So how can I solve the problem? I've found place in mtx-changer script where declared 300 seconds to wait. I've increased it value to 1800 seconds. Is it only place or I have to change somthing else in sources? Hi everybody. Increasing timeout in mtx-changer script hasn't helped, I have the same issue. Where in sources I can increase timeout to 30 minutes? I don’t see why this is on devel It should be on users, which is cc’d here. — Dan Langille -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users