Dear All,

I'd like to inform you, that I finally solved the problem.
Here is a short description about my "trace path".

1. With '-d100' I checked, that the main problem
is related to src/stored/dev.c code. While FSF for EOD
in last (read before fsf) step I got:

------------------------
(...)
Sd:pl-admbackup01: dev.c:1348 Doing read before fsf
Sd:pl-admbackup01: dev.c:1363 Issuing request sense...
Sd:pl-admbackup01: dev.c:1410 Set ST_EOT read errno=5. ERR=Input/output 
error
Sd:pl-admbackup01: dev.c:1413 dev.c:1412 read error on "Dev:TS3310-drv0" 
(/dev/IBMtape0n). ERR=Input/output error.
(...)
------------------------

AFAIK errno=5 (EIO) means "somthing wrong happened". So I
decided to verify what exactely is wrong in this op.
I assumed this is IBM-specfic case so started to dig IBM site.
I read "IBM Tape Device Drivers Programming Reference".
I found there the chapter about SIOC_REQSENSE command (ioctl)
and added some diagnostic code to source. Next time
I started SD I got in sd log:

------------------------
(...)
Sd:pl-admbackup01: dev.c:1293 fsf
Sd:pl-admbackup01: dev.c:1338 FSF has cap_fsf
Sd:pl-admbackup01: dev.c:1348 Doing read before fsf
Sd:pl-admbackup01: dev.c:1363 Issuing request sense...
Information Field Valid Bit-----1
Error Code----------------------0x70
Segment Number------------------0x00
filemark Detected Bit----------0
End Of Medium Bit---------------0
Illegal Length Indicator Bit----0
Sense Key-----------------------0x08
Information Bytes-------------0x00 0x00 0xfc 0x00
Additional Sense Length---------0x58
Command Specific Information----0x00 0x00 0x00 0x00
Additional Sense Code-----------0x00
Additional Sense Code Qualifier-0x05
Field Replaceable Unit Code-----0x30
Sense Key Specific Valid Bit----0
Sd:pl-admbackup01: dev.c:1410 Set ST_EOT read errno=5. ERR=Input/output 
error
Sd:pl-admbackup01: dev.c:1413 dev.c:1412 read error on "Dev:TS3310-drv0" 
(/dev/IBMtape0n). ERR=Input/output error.
Sd:pl-admbackup01: dev.c:1466 Return -1 from FSF
(...)
-------------------------

Ok, now have more details. I opened "IBM System Storage
TS3310 Tape Library Setup and Operator Guide" and found
this:
* About "Sense key": 8 --- Blank Check
* About "ASC/ASCQ" : 00 05 - EOD — Read or Space command
  terminated early because End of Data was encountered
Aha! For me it is clear, that this is a quasi-error,
because in our case this is what we expected. I wrote
yet another patch. In case of fsf/read error I check
sense code, and if it is known (as above), I set stat=0.

-------------------------
(...)
not here. can be found in attachement.
(...)
-------------------------

After applying this fsf now finds EOD correctly:

-------------------------
(...)
Sd:pl-admbackup01: dev.c:1300 Doing read before fsf
Sd:pl-admbackup01: dev.c:1302 **MQ** state: 1040, errno: 5
Sd:pl-admbackup01: dev.c:1316 ***MQ*** Issuing request sense...
Sd:pl-admbackup01: dev.c:1322 ***MQ*** Seems OK now...
Sd:pl-admbackup01: dev.c:1340 End of File mark from read. File=11
Sd:pl-admbackup01: dev.c:1344 Set ST_EOT
Sd:pl-admbackup01: dev.c:1388 Return 0 from FSF
Sd:pl-admbackup01: dev.c:1390 ST_EOF set on exit FSF
Sd:pl-admbackup01: dev.c:1393 ST_EOT set on exit FSF
Sd:pl-admbackup01: dev.c:1395 Return from FSF file=10
(...)
-------------------------

The problem was partially solved. Partially, because I still
could not store more than two jobs on tape (if the tapes
were reloaded).

In fact, I was close to solving it in July. I made this
patch and sent it to the list (my post on 19th of July).
I had same problems. But this I got 'Eureca!'.

The reason was setting "BSF at EOM = yes". Number of scanned
files didn't match with number in database. I removed
this line and finally, for about two weeks everything works
perfectly!!!

I only wonder why 'btape' insisted to do so. Anyway
limited trust was a good move in this case :-)

--

I don't know if it is an IBM driver-specific feature to report
EOD in this manner, but send the patch again to the list.
Perhaps it could be reviewed/modified and applied to the main
tree, or added to the 'contrib' dir (which don't exist yet)
as a possible solution in case similar to mine.

Regards,

Mariusz Czulada

P.S. This means that for me the problem is over, and
IBM TS3310 library can be treated now as a supported
platform :-)


Dnia 26-10-2007 o godz. 18:38 Kern Sibbald napisał(a):
> Hello,
> 
> Well, the test command is failing on a fsf command (forward space file).
> This
> means that there is something incorrectly set in the tape driver.  Since
> you
> have an IBM system, it is possible that they have put your drive in SysV
> mode
> rather than BSD mode.  Bacula works only in BSD mode, which is the default
> for Linux.  Please see the Tape Testing chapter of the manual for
> details on
> verifying and setting the mode.
> 
> Regards,
> 
> Kern
> 
> On Thursday 25 October 2007 15:20, Mariusz Czulada wrote:
> > Hi,
> >
> > According to given suggestions, I set devices in configuration files and
> > upgraded almost everything I could. Of corse (damn it) still nothing
> > works as it should. Since it took some months since my first posts, I'll
> > try to describe my problem from the start, and stop at first failure so
> > far.
> >
> > To remind, my system is:
> > * xSeries 346, CentOS 4.4 (like RH ES4)
> > * Two DS4000 FC 4 Gb PCI-X Single Port HBA;
> >   in system identified as QLogic QLA2460
> >   - PCI-X 2.0 to 4Gb FC, Single Channel
> > * TS3310 library, with two ULT3580-TD3 drives,
> >   connected directly (1 to 1) to HBAs.
> >
> > I upgraded QLogic firmware, patched kernel, upgraded QLogic drivers and
> > IBM tape tools. Now I have:
> > * QLogic BIOS BIOS Version: 1.29, FCode: 1.27, EFI: 1.09, Firmware:
> > 4.00.30
> > * kernel: 2.6.9-55.ELsmp
> > * qla2xxx drivers: 8.01.07.15
> > * lin_tape (IBM tape driver) 1.7.1
> >
> > I installed bacula 2.2.4 + my '-dt' patch. Configured with
> > "--prefix=/opt/bacula --with-postgresql=/opt/pgsql-8.1/ --with-openssl".
> >
> > My tape devices are /dev/IBMtape0n and /dev/IBMtape1n. Autochanger is
> > accessible through /dev/sg6:
> >
> > ******************************************************
> >
> > [EMAIL PROTECTED] ~]# tapeinfo -f /dev/sg4
> > Product Type: Tape Drive
> > Vendor ID: 'IBM     '
> > Product ID: 'ULT3580-TD3     '
> > Revision: '69U2'
> > Attached Changer: No
> > SerialNumber: 'xxxxxxxxxx'
> > MinBlock:1
> > MaxBlock:16777215
> > SCSI ID: 0
> > SCSI LUN: 0
> > Ready: yes
> > BufferedMode: yes
> > Medium Type: 0x38
> > Density Code: 0x44
> > BlockSize: 0
> > DataCompEnabled: yes
> > DataCompCapable: yes
> > DataDeCompEnabled: yes
> > CompType: 0x1
> > DeCompType: 0x1
> > Block Position: 2
> >
> > [EMAIL PROTECTED] ~]# tapeinfo -f /dev/sg5
> > Product Type: Tape Drive
> > Vendor ID: 'IBM     '
> > Product ID: 'ULT3580-TD3     '
> > Revision: '69U2'
> > Attached Changer: No
> > SerialNumber: 'xxxxxxxxxx'
> > MinBlock:1
> > MaxBlock:16777215
> > SCSI ID: 0
> > SCSI LUN: 0
> > Ready: no
> >
> > [EMAIL PROTECTED] ~]# tapeinfo -f /dev/sg6
> > Product Type: Medium Changer
> > Vendor ID: 'IBM     '
> > Product ID: '3576-MTL        '
> > Revision: '320G'
> > Attached Changer: No
> > SerialNumber: 'xxxxxxxxxxxx'
> > SCSI ID: 0
> > SCSI LUN: 1
> > Ready: yes
> >
> > [EMAIL PROTECTED] ~]# mt -f /dev/IBMtape0n status
> > SCSI 2 tape drive:
> > File number=-1, block number=0, partition=0.
> > Tape block size 0 bytes. Density code 0x44 (no translation).
> > Soft error count since last status=0
> > General status bits on (41000000):
> >  BOT ONLINE
> >
> > [EMAIL PROTECTED] ~]# mt -f /dev/IBMtape1n status
> > SCSI 2 tape drive:
> > File number=-1, block number=-1, partition=0.
> > Tape block size 0 bytes. Density code 0x44 (no translation).
> > Soft error count since last status=0
> > General status bits on (40000):
> >  DR_OPEN
> >
> > ******************************************************
> >
> > I set my bacula-sd.conf according to your suggesion. This is a part
> > related to my library:
> >
> > ******************************************************
> >
> > Device {
> >   Name = "Dev:TS3310-drv0"
> >   Device Type = tape
> >   Media Type = LTO-3
> >   Archive Device = /dev/IBMtape0n
> >   AutomaticMount = yes;               # when device opened, read it
> >   AlwaysOpen = yes;
> >   RemovableMedia = yes;
> >   RandomAccess = no;
> > #  Block positioning = no;
> > #  Hardware End of Medium = No
> > #  Fast Forward Space File = No
> > #  BSF at EOM = yes
> > #  TWO EOF = Yes
> >   Autochanger = yes;
> >   Drive Index = 0;
> > }
> >
> > Device {
> >   Name = "Dev:TS3310-drv1"
> >   Device Type = tape
> >   Media Type = LTO-3
> >   Archive Device = /dev/IBMtape1n
> >   AutomaticMount = yes;               # when device opened, read it
> >   AlwaysOpen = yes;
> >   RemovableMedia = yes;
> >   RandomAccess = no;
> > #  Block positioning = no;
> > #  Hardware End of Medium = No
> > #  Fast Forward Space File = No
> > #  BSF at EOM = yes
> > #  TWO EOF = Yes
> >   Autochanger = yes;
> >   Drive Index = 1;
> > }
> >
> > Autochanger {
> >   Name = "Achg:TS3310"
> >   Device = "Dev:TS3310-drv0"
> >   Device = "Dev:TS3310-drv1"
> >   Changer Device = /dev/sg6
> >   Changer Command = "/opt/bacula/etc/mtx-changer %c %o %S %a %d"
> > }
> >
> > ******************************************************
> >
> > Now started again with btape. Test went ok, but some steps were repeated
> > with extra options. Finally I got following recomendation:
> >
> > ******************************************************
> >
> > It looks like the test worked this time, please add:
> >
> >     Hardware End of Medium = No
> >     Fast Forward Space File = No
> >     BSF at EOM = yes
> >
> > to your Device resource in the Storage conf file.
> >
> > ******************************************************
> >
> > In attachement there is a log of btape output, run with '-d100 -dt'. If
> > some other debug level could be more suitable let me know.
> >
> > The problem is that previously I used these options, and bacula-sd still
> > had problems with writing to tapes.
> >
> > ------
> >
> > On IBM site there is a document about IBM tape
> > ftp://ftp.software.ibm.com/storage/devdrvr/Doc/IBM_Tape_Driver_PROGREF.pdf.
> > Starting from page 117 there is an information regarding linux system. I
> > read it once but I no almost nothing about bacula internals not how to
> > use mt devices. Not sure if it is important or helpful, but if you
> > could, please, spend a few minutes on this, perhaps you will find sth.
> > what IBM makes "some other way" and what causes such problems?
> >
> > Regards,
> >
> > Mariusz Czulada
> >
> > ----------------------------------------------------
> > UDOSTĘPNIONA Strefa marzeń magii i cudów Artist in Wonderland
> > http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Faliasek.html&sid
> >=72

----------------------------------------------------
Esbjörn Svensson Trio na Pokładzie w Gdyni!
już 2007-11-20. Przeczytaj!
http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fest.html&sid=100
--- bacula-2.2.4/src/stored/dev.c       2007-06-28 13:57:03.000000000 +0200
+++ bacula-2.2.4-dt/src/stored/dev.c    2007-11-02 14:22:21.000000000 +0100
@@ -82,6 +82,8 @@
 #include "bacula.h"
 #include "stored.h"

+#include <sys/IBM_tape.h>
+
 #ifndef O_NONBLOCK
 #define O_NONBLOCK 0
 #endif
@@ -1296,7 +1298,9 @@
       mt_com.mt_count = 1;
       while (num-- && !at_eot()) {
          Dmsg0(100, "Doing read before fsf\n");
-         if ((stat = this->read((char *)rbuf, rbuf_len)) < 0) {
+         stat = this->read((char *)rbuf, rbuf_len);
+         Dmsg2(1,"**MQ** state: %x, errno: %d\n", state, errno);
+         if (stat < 0) {
             if (errno == ENOMEM) {     /* tape record exceeds buf len */
                stat = rbuf_len;        /* This is OK */
             /*
@@ -1306,15 +1310,33 @@
             } else if (at_eof() && errno == ENOSPC) {
                stat = 0;
             } else {
-               berrno be;
-               set_eot();
-               clrerror(-1);
-               Dmsg2(100, "Set ST_EOT read errno=%d. ERR=%s\n", dev_errno,
-                  be.bstrerror());
-               Mmsg2(errmsg, _("read error on %s. ERR=%s.\n"),
-                  print_name(), be.bstrerror());
-               Dmsg1(100, "%s", errmsg);
-               break;
+               // MQ
+               struct request_sense sense_data;
+               int rc;
+               Dmsg0(100, "***MQ*** Issuing request sense...\n");
+               memset(&sense_data, 0, sizeof(struct request_sense));
+               rc = ioctl(m_fd, SIOC_REQSENSE, &sense_data);
+               if (rc == 0 && sense_data.err_code == 0x70 && sense_data.key == 0x08 &&
+                              sense_data.asc == 0x00 && sense_data.ascq == 0x05)
+               {
+                   Dmsg0(100, "***MQ*** Seems OK now...\n");
+                   stat = 0;
+               }
+               else
+               {
+                  Dmsg4(100, "***MQ*** sense error: err_code:%d, key:%d, asc:%d, ascq:%d\n",
+                     sense_data.err_code, sense_data.key, sense_data.asc, sense_data.ascq);
+
+                  berrno be;
+                  set_eot();
+                  clrerror(-1);
+                  Dmsg2(100, "Set ST_EOT read errno=%d. ERR=%s\n", dev_errno,
+                     be.bstrerror());
+                  Mmsg2(errmsg, _("read error on %s. ERR=%s.\n"),
+                     print_name(), be.bstrerror());
+                  Dmsg1(100, "%s", errmsg);
+                  break;
+               }
             }
          }
          if (stat == 0) {                /* EOF */
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to