Re: [Bacula-users] Fw: Rif: Re: Change tape problem
Hello, The error you describe below sounds more like a hardware or an OS driver error than a Bacula error. This also makes sense with the fact that it started with 1.38.11 and continues. You didn't give any details of what happened between 1.36.3 and 1.38.11, nor did you give any details on your OS. This error sounds very much like the kinds of problems that occur on FreeBSD systems when the freeze the driver (a very stupid thing to do IMO, but they do it). To deal with this problem, Bacula clears the device error status at every point were it gets an error, so I really don't see any reason to apply your patch. I suspect that your patch is serving to cover up some sort of driver or hardware problem. Regards, Kern On Friday 16 March 2007 09:39, Ferdinando Pasqualetti wrote: Hi Kern, sorry to ask you directly. Some days ago I sent the same email to devel-list, but I got no answer, so I am making a last try. Please tell me if the problem is not interesting because no one else signalled it, and it is not useful trying to understand it or if it is not clearly defined. I am following Arno's suggestion about a problem in writing additional tapes started with 1.38.11 and persisting in 2.0.2. Arno's idea is that the problem could be on tape or OS and maybe he is right, but the same hardware/OS with 1.36.3 did not have the problem (it could also be a hardware failure in the meantime, but apparently all the rest works). What I think happen is that once there have been an ERNOSPC writing a tape block the last block is correctly reread and the tape changed by the autochanger with a prelabelled tape, the the label is rewritten (or skipped, I do not know) and when the block is written on the new tape it get again the ERNOSPC error. This is reported again as an EOT, but the last block control fails and the job fails too (this is a consequetial error anyway). I have also made a very small modification to the block.c routine (I am not a programmer and more then that I am not a C programmer, so this change is not correct for sure, even if it solves the problem in some way). The change is: 532d531 if (dev-file == 0) { dev-clrerror(-1); } that simply means (I hope) clear all errors before writing the block if this is the first tape file That because the write continues to get the errors until an EOF is written. Because the file is normally around 1 Gb this slows down performance to 20-25% during this phase, but I can deal with it. What I would like to know is if this behaviour is really an hardware/firmare/OS problem according to your opinion and if the block routine could br made more resilient in some way (EOF mark at tape end, Closing and reopening of fd or whatever). Many thanks if you would like to give me an asnswer and very many thanks anyway for this great package. Sincerely, -- Ferdinando Pasqualetti G.T.Dati srl Tel. 0557310862 - 3356172731 - Fax 055720143 Hi, On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote: Hi Arno, I made some tests and this is what I think. When there is a tape change after an out of space error susequent block write continue to get that error even after the tape change by the robot. This continue. I made some changes to the block.c routine You'd better discuss this at bacula-devel, I think, or send Kern a mail explaining the problem and the resolution. (very simple, because I'm not a C programmer and also I don't know the logic of sd program). I made the routine enter the retry loop even for ERNOSPC if file number is 0. This made bacula-sd work correctly (but it took 20 hours to write file 0). After writing the EOF mark speed is normal again. My idea is that changing the tape does not reset the EOD condition on the tape That sounds like a bug, either in the hardware or the HBA driver. until a file mark is written. I do not know if this a wrong device or OS error, but I believe that the FD of tape should be closed and reopened in a tape change. I know about nothing about these details, so I won't comment on it... dd and mt tests always gave correct results, but dd always write an EOF mark at the and of the transfer. If you have some idea about that I will be very happy. Thank you very much in any c asze. Difficult problem, I think. If this is a hardware or driver problem, I don't think modifying the SD code is the right solution. If it works for you - fine, but it might be that you have to manage that patch for your installation yourself. Arno -- Ferdinando Pasqualetti G.T.Dati srl Tel. 0557310862 - 3356172731 - Fax 055720143 *Ferdinando Pasqualetti/San
[Bacula-users] Fw: Rif: Re: Change tape problem
Hi Kern, sorry to ask you directly. Some days ago I sent the same email to devel-list, but I got no answer, so I am making a last try. Please tell me if the problem is not interesting because no one else signalled it, and it is not useful trying to understand it or if it is not clearly defined. I am following Arno's suggestion about a problem in writing additional tapes started with 1.38.11 and persisting in 2.0.2. Arno's idea is that the problem could be on tape or OS and maybe he is right, but the same hardware/OS with 1.36.3 did not have the problem (it could also be a hardware failure in the meantime, but apparently all the rest works). What I think happen is that once there have been an ERNOSPC writing a tape block the last block is correctly reread and the tape changed by the autochanger with a prelabelled tape, the the label is rewritten (or skipped, I do not know) and when the block is written on the new tape it get again the ERNOSPC error. This is reported again as an EOT, but the last block control fails and the job fails too (this is a consequetial error anyway). I have also made a very small modification to the block.c routine (I am not a programmer and more then that I am not a C programmer, so this change is not correct for sure, even if it solves the problem in some way). The change is: 532d531 if (dev-file == 0) { dev-clrerror(-1); } that simply means (I hope) clear all errors before writing the block if this is the first tape file That because the write continues to get the errors until an EOF is written. Because the file is normally around 1 Gb this slows down performance to 20-25% during this phase, but I can deal with it. What I would like to know is if this behaviour is really an hardware/firmare/OS problem according to your opinion and if the block routine could br made more resilient in some way (EOF mark at tape end, Closing and reopening of fd or whatever). Many thanks if you would like to give me an asnswer and very many thanks anyway for this great package. Sincerely, -- Ferdinando Pasqualetti G.T.Dati srl Tel. 0557310862 - 3356172731 - Fax 055720143 Hi, On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote: Hi Arno, I made some tests and this is what I think. When there is a tape change after an out of space error susequent block write continue to get that error even after the tape change by the robot. This continue. I made some changes to the block.c routine You'd better discuss this at bacula-devel, I think, or send Kern a mail explaining the problem and the resolution. (very simple, because I'm not a C programmer and also I don't know the logic of sd program). I made the routine enter the retry loop even for ERNOSPC if file number is 0. This made bacula-sd work correctly (but it took 20 hours to write file 0). After writing the EOF mark speed is normal again. My idea is that changing the tape does not reset the EOD condition on the tape That sounds like a bug, either in the hardware or the HBA driver. until a file mark is written. I do not know if this a wrong device or OS error, but I believe that the FD of tape should be closed and reopened in a tape change. I know about nothing about these details, so I won't comment on it... dd and mt tests always gave correct results, but dd always write an EOF mark at the and of the transfer. If you have some idea about that I will be very happy. Thank you very much in any c asze. Difficult problem, I think. If this is a hardware or driver problem, I don't think modifying the SD code is the right solution. If it works for you - fine, but it might be that you have to manage that patch for your installation yourself. Arno -- Ferdinando Pasqualetti G.T.Dati srl Tel. 0557310862 - 3356172731 - Fax 055720143 *Ferdinando Pasqualetti/San Lazzaro/Conserve Italia* 27/02/2007 09.47 Per Arno Lehmann [EMAIL PROTECTED] CC bacula-users bacula-users@lists.sourceforge.net Oggetto Rif: Re: [Bacula-users] Change tape problem Link Notes:///C12563A900369A93/D46731D63F38165B8025651C003EAC4E/56E29AEC82B8E837C125728E006B77F9 Hi Arno, thank you very much for your answer. I will try asap the tests you are suggesting. By the way, I purged the volumes involved in the error shown in the original message (it was the third try), restarted the backup job and here is the (correct) result. 25-feb 19:55 bacula-dir: Start Backup JobId 12927, Job=webfs3-job.2007-02-25_19.55.40 25-feb 19:55 bacula-dir: Recycled volume web-004 25-feb 19:55 webfs3: ClientRunBeforeJob: run command /root/restartsmb 25-feb 19:55 webfs3: ClientRunBeforeJob: Shutting down SMB services: [ OK ] 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato 25-feb 19:55 webfs3:
[Bacula-users] Fw: Rif: Re: Change tape problem
Hi everybody, I am following Arno's suggestion about a problem in writing additional tapes started with 1.38.11 and persisting in 2.0.2. Arno's idea is that the problem could be on tape or OS and maybe he is right, but the same hardware/OS with 1.36.3 did not have the problem (it could also be a hardware failure in the meantime, but apparently all the rest works). What I think happen is that once there have been an ERNOSPC writing a tape block the last block is correctly reread and the tape changed by the autochanger with a prelabelled tape, the the label is rewritten (or skipped, I do not know) and when the block is written on the new tape it get again the ERNOSPC error. This is reported again as an EOT, but the last block control fails and the job fails too. I have also made a very small modification to the block.c routine (I am not a programmer and more then that I am not a C programmer, so this change is not correct for sure, even if it solves the problem in some way). The change is: 532d531 if (dev-file == 0) { dev-clrerror(-1); } that simply means (I hope) clear all errors before writing the block if this is the first tape file That because the write continues to get the errors until an EOF is written. Because the file is normally around 1 Gb this slows down performance to 20-25% during this phase, but I can deal with it. What I would like to know is if this behaviour is really an hardware/firmare/OS problem according to your opinion and if the block routine could br made more resilient in some way (EOF mark at tape end, Closing and reopening of fd or whatever). Many thanks if you would like to give me an asnswer and very many thanks anyway for this great package. -- Ferdinando Pasqualetti G.T.Dati srl Tel. 0557310862 - 3356172731 - Fax 055720143 Hi, On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote: Hi Arno, I made some tests and this is what I think. When there is a tape change after an out of space error susequent block write continue to get that error even after the tape change by the robot. This continue. I made some changes to the block.c routine You'd better discuss this at bacula-devel, I think, or send Kern a mail explaining the problem and the resolution. (very simple, because I'm not a C programmer and also I don't know the logic of sd program). I made the routine enter the retry loop even for ERNOSPC if file number is 0. This made bacula-sd work correctly (but it took 20 hours to write file 0). After writing the EOF mark speed is normal again. My idea is that changing the tape does not reset the EOD condition on the tape That sounds like a bug, either in the hardware or the HBA driver. until a file mark is written. I do not know if this a wrong device or OS error, but I believe that the FD of tape should be closed and reopened in a tape change. I know about nothing about these details, so I won't comment on it... dd and mt tests always gave correct results, but dd always write an EOF mark at the and of the transfer. If you have some idea about that I will be very happy. Thank you very much in any c asze. Difficult problem, I think. If this is a hardware or driver problem, I don't think modifying the SD code is the right solution. If it works for you - fine, but it might be that you have to manage that patch for your installation yourself. Arno -- Ferdinando Pasqualetti G.T.Dati srl Tel. 0557310862 - 3356172731 - Fax 055720143 *Ferdinando Pasqualetti/San Lazzaro/Conserve Italia* 27/02/2007 09.47 Per Arno Lehmann [EMAIL PROTECTED] CC bacula-users bacula-users@lists.sourceforge.net Oggetto Rif: Re: [Bacula-users] Change tape problem Link Notes:///C12563A900369A93/D46731D63F38165B8025651C003EAC4E/56E29AEC82B8E837C125728E006B77F9 Hi Arno, thank you very much for your answer. I will try asap the tests you are suggesting. By the way, I purged the volumes involved in the error shown in the original message (it was the third try), restarted the backup job and here is the (correct) result. 25-feb 19:55 bacula-dir: Start Backup JobId 12927, Job=webfs3-job.2007-02-25_19.55.40 25-feb 19:55 bacula-dir: Recycled volume web-004 25-feb 19:55 webfs3: ClientRunBeforeJob: run command /root/restartsmb 25-feb 19:55 webfs3: ClientRunBeforeJob: Shutting down SMB services: [ OK ] 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato 25-feb 19:55 webfs3: ClientRunBeforeJob: Starting SMB services: [ OK ] 25-feb 19:55 webfs3: ClientRunBeforeJob: [ OK ] 25-feb 19:55 bacula-sd: 3307 Issuing autochanger unload slot 7, drive 0 command. 25-feb 19:57 bacula-sd: 3304 Issuing autochanger load slot 3, drive 0 command.