Re: [Bacula-users] Fw: Rif: Re: Change tape problem

2007-03-20 Thread Kern Sibbald
Hello,

The error you describe below sounds more like a hardware or an OS driver error 
than a Bacula error.  This also makes sense with the fact that it started 
with 1.38.11 and continues.  You didn't give any details of what happened 
between 1.36.3 and 1.38.11, nor did you give any details on your OS. 

This error sounds very much like the kinds of problems that occur on FreeBSD 
systems when the freeze the driver (a very stupid thing to do IMO, but they 
do it).  To deal with this problem, Bacula clears the device error status at 
every point were it gets an error, so I really don't see any reason to apply 
your patch.  

I suspect that your patch is serving to cover up some sort of driver or 
hardware problem.

Regards,

Kern

On Friday 16 March 2007 09:39, Ferdinando Pasqualetti wrote:
 Hi Kern, 
 sorry to ask you directly. Some days ago I sent the same email to 
devel-list, but I got no answer, so I am making a last try. Please tell me if 
the problem is not interesting because no one else signalled it, and it is 
not useful trying to understand it or if it is not clearly defined. 
  
 I am following Arno's suggestion about a problem in writing additional tapes 
started with 1.38.11 and persisting in 2.0.2. Arno's idea is that the problem 
could be on tape or OS and maybe he is right,  but the same hardware/OS with 
1.36.3 did not have the problem (it could also be a hardware failure in the 
meantime, but apparently all the rest works). 
  
 What I think happen is that once  there have been an ERNOSPC writing a tape 
block the last block is correctly reread and the tape changed by the 
autochanger with a prelabelled tape, the the label is rewritten (or skipped, 
I do not know) and when the block is written on the new tape it get again the 
ERNOSPC error. This is reported again as an EOT, but the last block control 
fails and the job fails too (this is a consequetial error anyway). 
  
 I have also made a very small modification to the block.c routine (I am not 
a programmer and more then that I am not a C programmer, so this change is 
not correct for sure, even if it solves the problem in some way). The change 
is: 
  
 532d531 
     if (dev-file == 0) { dev-clrerror(-1); } 
  
 that simply means (I hope) clear all errors before writing the block if 
this is the first tape file 
 That because the write continues to get the errors until an EOF is written. 
Because the file is normally around 1 Gb this slows down performance to 
20-25% during this phase, but I can deal with it. 
  
 What I would like to know is if this behaviour is really an 
hardware/firmare/OS problem according to your opinion and if the block 
routine could br made more resilient in some way (EOF mark at tape end, 
Closing and reopening of fd or whatever). 
  
 Many thanks if you would like to give me an asnswer and very many thanks 
anyway for this great package. 
  
 Sincerely, 
  
 --
  Ferdinando Pasqualetti
  G.T.Dati srl
  Tel. 0557310862 - 3356172731 - Fax 055720143
  
  
   
 Hi,
  
  On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote:
   
   Hi Arno,
   I made some tests and this is what I think.
   When there is a tape change after an out of space error susequent block 
   write continue to get that error even after the tape change by the 
   robot. This continue.
   I made some changes to the block.c routine
  
  You'd better discuss this at bacula-devel, I think, or send Kern a mail 
  explaining the problem and the resolution.
  
   (very simple, because I'm not 
   a C programmer and also I don't know the logic of sd program). I made 
   the routine enter the retry loop even for ERNOSPC if file number is 0. 
   This made bacula-sd work correctly (but it took 20 hours to write file 
   0). After writing the EOF mark speed is normal again.
   My idea is that changing the tape does not reset the EOD condition on 
   the tape
  
  That sounds like a bug, either in the hardware or the HBA driver.
  
   until a file mark is written. I do not know if this a wrong 
   device or OS error, but I believe that the FD of tape should be closed 
   and reopened in a tape change.
  
  I know about nothing about these details, so I won't comment on it...
  
   dd and mt tests always gave correct results, but dd always write an EOF 
   mark at the and of the transfer.
   
   If you have some idea about that I will be very happy. Thank you very 
   much in any c asze.
  
  Difficult problem, I think.
  
  If this is a hardware or driver problem, I don't think modifying the SD 
  code is the right solution.
  
  If it works for you - fine, but it might be that you have to manage that 
  patch for your installation yourself.
  
  Arno
  
    
  
 --
   Ferdinando Pasqualetti
   G.T.Dati srl
   Tel. 0557310862 - 3356172731 - Fax 055720143
   
   
   
   
   
   *Ferdinando Pasqualetti/San 

[Bacula-users] Fw: Rif: Re: Change tape problem

2007-03-16 Thread Ferdinando Pasqualetti

Hi Kern,
sorry to ask you directly. Some days
ago I sent the same email to devel-list, but I got no answer, so I am making
a last try. Please tell me if the problem is not interesting because no
one else signalled it, and it is not useful trying to understand it or
if it is not clearly defined.

I am following Arno's suggestion about
a problem in writing additional tapes started with 1.38.11 and persisting
in 2.0.2. Arno's idea is that the problem could be on tape or OS and maybe
he is right, but the same hardware/OS with 1.36.3 did not have the
problem (it could also be a hardware failure in the meantime, but apparently
all the rest works).

What I think happen is that once there
have been an ERNOSPC writing a tape block the last block is correctly reread
and the tape changed by the autochanger with a prelabelled tape, the the
label is rewritten (or skipped, I do not know) and when the block is written
on the new tape it get again the ERNOSPC error. This is reported again
as an EOT, but the last block control fails and the job fails too (this
is a consequetial error anyway).

I have also made a very small modification
to the block.c routine (I am not a programmer and more then that I am not
a C programmer, so this change is not correct for sure, even if it solves
the problem in some way). The change is:

532d531
  if (dev-file ==
0) { dev-clrerror(-1); }

that simply means (I hope) clear
all errors before writing the block if this is the first tape file
That because the write continues to
get the errors until an EOF is written. Because the file is normally around
1 Gb this slows down performance to 20-25% during this phase, but I can
deal with it.

What I would like to know is if this
behaviour is really an hardware/firmare/OS problem according to your opinion
and if the block routine could br made more resilient in some way (EOF
mark at tape end, Closing and reopening of fd or whatever).

Many thanks if you would like to give
me an asnswer and very many thanks anyway for this great package.

Sincerely,

--
Ferdinando Pasqualetti
G.T.Dati srl
Tel. 0557310862 - 3356172731 - Fax 055720143



Hi,

On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote:
 
 Hi Arno,
 I made some tests and this is what I think.
 When there is a tape change after an out of space error susequent
block 
 write continue to get that error even after the tape change by the

 robot. This continue.
 I made some changes to the block.c routine

You'd better discuss this at bacula-devel, I think, or send Kern a mail

explaining the problem and the resolution.

 (very simple, because I'm not 
 a C programmer and also I don't know the logic of sd program). I made

 the routine enter the retry loop even for ERNOSPC if file number is
0. 
 This made bacula-sd work correctly (but it took 20 hours to write
file 
 0). After writing the EOF mark speed is normal again.
 My idea is that changing the tape does not reset the EOD condition
on 
 the tape

That sounds like a bug, either in the hardware or the HBA driver.

 until a file mark is written. I do not know if this a wrong 
 device or OS error, but I believe that the FD of tape should be closed

 and reopened in a tape change.

I know about nothing about these details, so I won't comment on it...

 dd and mt tests always gave correct results, but dd always write an
EOF 
 mark at the and of the transfer.
 
 If you have some idea about that I will be very happy. Thank you very

 much in any c asze.

Difficult problem, I think.

If this is a hardware or driver problem, I don't think modifying the SD

code is the right solution.

If it works for you - fine, but it might be that you have to manage that

patch for your installation yourself.

Arno

 
 --
 Ferdinando Pasqualetti
 G.T.Dati srl
 Tel. 0557310862 - 3356172731 - Fax 055720143
 
 
 
 
 
 *Ferdinando Pasqualetti/San Lazzaro/Conserve Italia*
 
 27/02/2007 09.47
 
 
 Per
 Arno
Lehmann [EMAIL PROTECTED]
 CC
 bacula-users
bacula-users@lists.sourceforge.net
 Oggetto
 Rif:
Re: [Bacula-users] Change tape problem Link 
 Notes:///C12563A900369A93/D46731D63F38165B8025651C003EAC4E/56E29AEC82B8E837C125728E006B77F9
 
 
 
 
 
 
 
 Hi Arno,
 thank you very much for your answer. I will try asap the tests you
are 
 suggesting. By the way, I purged the volumes involved in the error
shown 
 in the original message (it was the third try), restarted the backup
job 
 and here is the (correct) result.
 
 25-feb 19:55 bacula-dir: Start Backup JobId 12927, 
 Job=webfs3-job.2007-02-25_19.55.40
 25-feb 19:55 bacula-dir: Recycled volume web-004
 25-feb 19:55 webfs3: ClientRunBeforeJob: run command /root/restartsmb
 25-feb 19:55 webfs3: ClientRunBeforeJob: Shutting down SMB services:
[ 
 OK ]
 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato
 25-feb 19:55 webfs3: 

[Bacula-users] Fw: Rif: Re: Change tape problem

2007-03-13 Thread Ferdinando Pasqualetti

Hi everybody,
I am following Arno's suggestion about
a problem in writing additional tapes started with 1.38.11 and persisting
in 2.0.2. Arno's idea is that the problem could be on tape or OS and maybe
he is right, but the same hardware/OS with 1.36.3 did not have the
problem (it could also be a hardware failure in the meantime, but apparently
all the rest works).
What I think happen is that once there
have been an ERNOSPC writing a tape block the last block is correctly reread
and the tape changed by the autochanger with a prelabelled tape, the the
label is rewritten (or skipped, I do not know) and when the block is written
on the new tape it get again the ERNOSPC error. This is reported again
as an EOT, but the last block control fails and the job fails too.
I have also made a very small modification
to the block.c routine (I am not a programmer and more then that I am not
a C programmer, so this change is not correct for sure, even if it solves
the problem in some way). The change is:

532d531
  if (dev-file ==
0) { dev-clrerror(-1); }

that simply means (I hope) clear
all errors before writing the block if this is the first tape file
That because the write continues to
get the errors until an EOF is written. Because the file is normally around
1 Gb this slows down performance to 20-25% during this phase, but I can
deal with it.

What I would like to know is if this
behaviour is really an hardware/firmare/OS problem according to your opinion
and if the block routine could br made more resilient in some way (EOF
mark at tape end, Closing and reopening of fd or whatever).

Many thanks if you would like to give
me an asnswer and very many thanks anyway for this great package.



--
Ferdinando Pasqualetti
G.T.Dati srl
Tel. 0557310862 - 3356172731 - Fax 055720143



Hi,

On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote:
 
 Hi Arno,
 I made some tests and this is what I think.
 When there is a tape change after an out of space error susequent
block 
 write continue to get that error even after the tape change by the

 robot. This continue.
 I made some changes to the block.c routine

You'd better discuss this at bacula-devel, I think, or send Kern a mail

explaining the problem and the resolution.

 (very simple, because I'm not 
 a C programmer and also I don't know the logic of sd program). I made

 the routine enter the retry loop even for ERNOSPC if file number is
0. 
 This made bacula-sd work correctly (but it took 20 hours to write
file 
 0). After writing the EOF mark speed is normal again.
 My idea is that changing the tape does not reset the EOD condition
on 
 the tape

That sounds like a bug, either in the hardware or the HBA driver.

 until a file mark is written. I do not know if this a wrong 
 device or OS error, but I believe that the FD of tape should be closed

 and reopened in a tape change.

I know about nothing about these details, so I won't comment on it...

 dd and mt tests always gave correct results, but dd always write an
EOF 
 mark at the and of the transfer.
 
 If you have some idea about that I will be very happy. Thank you very

 much in any c asze.

Difficult problem, I think.

If this is a hardware or driver problem, I don't think modifying the SD

code is the right solution.

If it works for you - fine, but it might be that you have to manage that

patch for your installation yourself.

Arno

 
 --
 Ferdinando Pasqualetti
 G.T.Dati srl
 Tel. 0557310862 - 3356172731 - Fax 055720143
 
 
 
 
 
 *Ferdinando Pasqualetti/San Lazzaro/Conserve Italia*
 
 27/02/2007 09.47
 
 
 Per
 Arno
Lehmann [EMAIL PROTECTED]
 CC
 bacula-users
bacula-users@lists.sourceforge.net
 Oggetto
 Rif:
Re: [Bacula-users] Change tape problem Link 
 Notes:///C12563A900369A93/D46731D63F38165B8025651C003EAC4E/56E29AEC82B8E837C125728E006B77F9
 
 
 
 
 
 
 
 Hi Arno,
 thank you very much for your answer. I will try asap the tests you
are 
 suggesting. By the way, I purged the volumes involved in the error
shown 
 in the original message (it was the third try), restarted the backup
job 
 and here is the (correct) result.
 
 25-feb 19:55 bacula-dir: Start Backup JobId 12927, 
 Job=webfs3-job.2007-02-25_19.55.40
 25-feb 19:55 bacula-dir: Recycled volume web-004
 25-feb 19:55 webfs3: ClientRunBeforeJob: run command /root/restartsmb
 25-feb 19:55 webfs3: ClientRunBeforeJob: Shutting down SMB services:
[ 
 OK ]
 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato
 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato
 25-feb 19:55 webfs3: ClientRunBeforeJob: Starting SMB services: [
OK ]
 25-feb 19:55 webfs3: ClientRunBeforeJob: [ OK ]
 25-feb 19:55 bacula-sd: 3307 Issuing autochanger unload slot
7, drive 
 0 command.
 25-feb 19:57 bacula-sd: 3304 Issuing autochanger load slot 3,
drive 0 
 command.