Dear All,
I'd like to inform you, that I finally solved the problem.
Here is a short description about my "trace path".
1. With '-d100' I checked, that the main problem
is related to src/stored/dev.c code. While FSF for EOD
in last (read before fsf) step I got:
------------------------
(...)
Sd:pl-admbackup01: dev.c:1348 Doing read before fsf
Sd:pl-admbackup01: dev.c:1363 Issuing request sense...
Sd:pl-admbackup01: dev.c:1410 Set ST_EOT read errno=5. ERR=Input/output
error
Sd:pl-admbackup01: dev.c:1413 dev.c:1412 read error on "Dev:TS3310-drv0"
(/dev/IBMtape0n). ERR=Input/output error.
(...)
------------------------
AFAIK errno=5 (EIO) means "somthing wrong happened". So I
decided to verify what exactely is wrong in this op.
I assumed this is IBM-specfic case so started to dig IBM site.
I read "IBM Tape Device Drivers Programming Reference".
I found there the chapter about SIOC_REQSENSE command (ioctl)
and added some diagnostic code to source. Next time
I started SD I got in sd log:
------------------------
(...)
Sd:pl-admbackup01: dev.c:1293 fsf
Sd:pl-admbackup01: dev.c:1338 FSF has cap_fsf
Sd:pl-admbackup01: dev.c:1348 Doing read before fsf
Sd:pl-admbackup01: dev.c:1363 Issuing request sense...
Information Field Valid Bit-----1
Error Code----------------------0x70
Segment Number------------------0x00
filemark Detected Bit----------0
End Of Medium Bit---------------0
Illegal Length Indicator Bit----0
Sense Key-----------------------0x08
Information Bytes-------------0x00 0x00 0xfc 0x00
Additional Sense Length---------0x58
Command Specific Information----0x00 0x00 0x00 0x00
Additional Sense Code-----------0x00
Additional Sense Code Qualifier-0x05
Field Replaceable Unit Code-----0x30
Sense Key Specific Valid Bit----0
Sd:pl-admbackup01: dev.c:1410 Set ST_EOT read errno=5. ERR=Input/output
error
Sd:pl-admbackup01: dev.c:1413 dev.c:1412 read error on "Dev:TS3310-drv0"
(/dev/IBMtape0n). ERR=Input/output error.
Sd:pl-admbackup01: dev.c:1466 Return -1 from FSF
(...)
-------------------------
Ok, now have more details. I opened "IBM System Storage
TS3310 Tape Library Setup and Operator Guide" and found
this:
* About "Sense key": 8 --- Blank Check
* About "ASC/ASCQ" : 00 05 - EOD — Read or Space command
terminated early because End of Data was encountered
Aha! For me it is clear, that this is a quasi-error,
because in our case this is what we expected. I wrote
yet another patch. In case of fsf/read error I check
sense code, and if it is known (as above), I set stat=0.
-------------------------
(...)
not here. can be found in attachement.
(...)
-------------------------
After applying this fsf now finds EOD correctly:
-------------------------
(...)
Sd:pl-admbackup01: dev.c:1300 Doing read before fsf
Sd:pl-admbackup01: dev.c:1302 **MQ** state: 1040, errno: 5
Sd:pl-admbackup01: dev.c:1316 ***MQ*** Issuing request sense...
Sd:pl-admbackup01: dev.c:1322 ***MQ*** Seems OK now...
Sd:pl-admbackup01: dev.c:1340 End of File mark from read. File=11
Sd:pl-admbackup01: dev.c:1344 Set ST_EOT
Sd:pl-admbackup01: dev.c:1388 Return 0 from FSF
Sd:pl-admbackup01: dev.c:1390 ST_EOF set on exit FSF
Sd:pl-admbackup01: dev.c:1393 ST_EOT set on exit FSF
Sd:pl-admbackup01: dev.c:1395 Return from FSF file=10
(...)
-------------------------
The problem was partially solved. Partially, because I still
could not store more than two jobs on tape (if the tapes
were reloaded).
In fact, I was close to solving it in July. I made this
patch and sent it to the list (my post on 19th of July).
I had same problems. But this I got 'Eureca!'.
The reason was setting "BSF at EOM = yes". Number of scanned
files didn't match with number in database. I removed
this line and finally, for about two weeks everything works
perfectly!!!
I only wonder why 'btape' insisted to do so. Anyway
limited trust was a good move in this case :-)
--
I don't know if it is an IBM driver-specific feature to report
EOD in this manner, but send the patch again to the list.
Perhaps it could be reviewed/modified and applied to the main
tree, or added to the 'contrib' dir (which don't exist yet)
as a possible solution in case similar to mine.
Regards,
Mariusz Czulada
P.S. This means that for me the problem is over, and
IBM TS3310 library can be treated now as a supported
platform :-)
Dnia 26-10-2007 o godz. 18:38 Kern Sibbald napisał(a):
> Hello,
>
> Well, the test command is failing on a fsf command (forward space file).
> This
> means that there is something incorrectly set in the tape driver. Since
> you
> have an IBM system, it is possible that they have put your drive in SysV
> mode
> rather than BSD mode. Bacula works only in BSD mode, which is the default
> for Linux. Please see the Tape Testing chapter of the manual for
> details on
> verifying and setting the mode.
>
> Regards,
>
> Kern
>
> On Thursday 25 October 2007 15:20, Mariusz Czulada wrote:
> > Hi,
> >
> > According to given suggestions, I set devices in configuration files and
> > upgraded almost everything I could. Of corse (damn it) still nothing
> > works as it should. Since it took some months since my first posts, I'll
> > try to describe my problem from the start, and stop at first failure so
> > far.
> >
> > To remind, my system is:
> > * xSeries 346, CentOS 4.4 (like RH ES4)
> > * Two DS4000 FC 4 Gb PCI-X Single Port HBA;
> > in system identified as QLogic QLA2460
> > - PCI-X 2.0 to 4Gb FC, Single Channel
> > * TS3310 library, with two ULT3580-TD3 drives,
> > connected directly (1 to 1) to HBAs.
> >
> > I upgraded QLogic firmware, patched kernel, upgraded QLogic drivers and
> > IBM tape tools. Now I have:
> > * QLogic BIOS BIOS Version: 1.29, FCode: 1.27, EFI: 1.09, Firmware:
> > 4.00.30
> > * kernel: 2.6.9-55.ELsmp
> > * qla2xxx drivers: 8.01.07.15
> > * lin_tape (IBM tape driver) 1.7.1
> >
> > I installed bacula 2.2.4 + my '-dt' patch. Configured with
> > "--prefix=/opt/bacula --with-postgresql=/opt/pgsql-8.1/ --with-openssl".
> >
> > My tape devices are /dev/IBMtape0n and /dev/IBMtape1n. Autochanger is
> > accessible through /dev/sg6:
> >
> > ******************************************************
> >
> > [EMAIL PROTECTED] ~]# tapeinfo -f /dev/sg4
> > Product Type: Tape Drive
> > Vendor ID: 'IBM '
> > Product ID: 'ULT3580-TD3 '
> > Revision: '69U2'
> > Attached Changer: No
> > SerialNumber: 'xxxxxxxxxx'
> > MinBlock:1
> > MaxBlock:16777215
> > SCSI ID: 0
> > SCSI LUN: 0
> > Ready: yes
> > BufferedMode: yes
> > Medium Type: 0x38
> > Density Code: 0x44
> > BlockSize: 0
> > DataCompEnabled: yes
> > DataCompCapable: yes
> > DataDeCompEnabled: yes
> > CompType: 0x1
> > DeCompType: 0x1
> > Block Position: 2
> >
> > [EMAIL PROTECTED] ~]# tapeinfo -f /dev/sg5
> > Product Type: Tape Drive
> > Vendor ID: 'IBM '
> > Product ID: 'ULT3580-TD3 '
> > Revision: '69U2'
> > Attached Changer: No
> > SerialNumber: 'xxxxxxxxxx'
> > MinBlock:1
> > MaxBlock:16777215
> > SCSI ID: 0
> > SCSI LUN: 0
> > Ready: no
> >
> > [EMAIL PROTECTED] ~]# tapeinfo -f /dev/sg6
> > Product Type: Medium Changer
> > Vendor ID: 'IBM '
> > Product ID: '3576-MTL '
> > Revision: '320G'
> > Attached Changer: No
> > SerialNumber: 'xxxxxxxxxxxx'
> > SCSI ID: 0
> > SCSI LUN: 1
> > Ready: yes
> >
> > [EMAIL PROTECTED] ~]# mt -f /dev/IBMtape0n status
> > SCSI 2 tape drive:
> > File number=-1, block number=0, partition=0.
> > Tape block size 0 bytes. Density code 0x44 (no translation).
> > Soft error count since last status=0
> > General status bits on (41000000):
> > BOT ONLINE
> >
> > [EMAIL PROTECTED] ~]# mt -f /dev/IBMtape1n status
> > SCSI 2 tape drive:
> > File number=-1, block number=-1, partition=0.
> > Tape block size 0 bytes. Density code 0x44 (no translation).
> > Soft error count since last status=0
> > General status bits on (40000):
> > DR_OPEN
> >
> > ******************************************************
> >
> > I set my bacula-sd.conf according to your suggesion. This is a part
> > related to my library:
> >
> > ******************************************************
> >
> > Device {
> > Name = "Dev:TS3310-drv0"
> > Device Type = tape
> > Media Type = LTO-3
> > Archive Device = /dev/IBMtape0n
> > AutomaticMount = yes; # when device opened, read it
> > AlwaysOpen = yes;
> > RemovableMedia = yes;
> > RandomAccess = no;
> > # Block positioning = no;
> > # Hardware End of Medium = No
> > # Fast Forward Space File = No
> > # BSF at EOM = yes
> > # TWO EOF = Yes
> > Autochanger = yes;
> > Drive Index = 0;
> > }
> >
> > Device {
> > Name = "Dev:TS3310-drv1"
> > Device Type = tape
> > Media Type = LTO-3
> > Archive Device = /dev/IBMtape1n
> > AutomaticMount = yes; # when device opened, read it
> > AlwaysOpen = yes;
> > RemovableMedia = yes;
> > RandomAccess = no;
> > # Block positioning = no;
> > # Hardware End of Medium = No
> > # Fast Forward Space File = No
> > # BSF at EOM = yes
> > # TWO EOF = Yes
> > Autochanger = yes;
> > Drive Index = 1;
> > }
> >
> > Autochanger {
> > Name = "Achg:TS3310"
> > Device = "Dev:TS3310-drv0"
> > Device = "Dev:TS3310-drv1"
> > Changer Device = /dev/sg6
> > Changer Command = "/opt/bacula/etc/mtx-changer %c %o %S %a %d"
> > }
> >
> > ******************************************************
> >
> > Now started again with btape. Test went ok, but some steps were repeated
> > with extra options. Finally I got following recomendation:
> >
> > ******************************************************
> >
> > It looks like the test worked this time, please add:
> >
> > Hardware End of Medium = No
> > Fast Forward Space File = No
> > BSF at EOM = yes
> >
> > to your Device resource in the Storage conf file.
> >
> > ******************************************************
> >
> > In attachement there is a log of btape output, run with '-d100 -dt'. If
> > some other debug level could be more suitable let me know.
> >
> > The problem is that previously I used these options, and bacula-sd still
> > had problems with writing to tapes.
> >
> > ------
> >
> > On IBM site there is a document about IBM tape
> > ftp://ftp.software.ibm.com/storage/devdrvr/Doc/IBM_Tape_Driver_PROGREF.pdf.
> > Starting from page 117 there is an information regarding linux system. I
> > read it once but I no almost nothing about bacula internals not how to
> > use mt devices. Not sure if it is important or helpful, but if you
> > could, please, spend a few minutes on this, perhaps you will find sth.
> > what IBM makes "some other way" and what causes such problems?
> >
> > Regards,
> >
> > Mariusz Czulada
> >
> > ----------------------------------------------------
> > UDOSTĘPNIONA Strefa marzeń magii i cudów Artist in Wonderland
> > http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Faliasek.html&sid
> >=72
----------------------------------------------------
Esbjörn Svensson Trio na Pokładzie w Gdyni!
już 2007-11-20. Przeczytaj!
http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fest.html&sid=100
--- bacula-2.2.4/src/stored/dev.c 2007-06-28 13:57:03.000000000 +0200
+++ bacula-2.2.4-dt/src/stored/dev.c 2007-11-02 14:22:21.000000000 +0100
@@ -82,6 +82,8 @@
#include "bacula.h"
#include "stored.h"
+#include <sys/IBM_tape.h>
+
#ifndef O_NONBLOCK
#define O_NONBLOCK 0
#endif
@@ -1296,7 +1298,9 @@
mt_com.mt_count = 1;
while (num-- && !at_eot()) {
Dmsg0(100, "Doing read before fsf\n");
- if ((stat = this->read((char *)rbuf, rbuf_len)) < 0) {
+ stat = this->read((char *)rbuf, rbuf_len);
+ Dmsg2(1,"**MQ** state: %x, errno: %d\n", state, errno);
+ if (stat < 0) {
if (errno == ENOMEM) { /* tape record exceeds buf len */
stat = rbuf_len; /* This is OK */
/*
@@ -1306,15 +1310,33 @@
} else if (at_eof() && errno == ENOSPC) {
stat = 0;
} else {
- berrno be;
- set_eot();
- clrerror(-1);
- Dmsg2(100, "Set ST_EOT read errno=%d. ERR=%s\n", dev_errno,
- be.bstrerror());
- Mmsg2(errmsg, _("read error on %s. ERR=%s.\n"),
- print_name(), be.bstrerror());
- Dmsg1(100, "%s", errmsg);
- break;
+ // MQ
+ struct request_sense sense_data;
+ int rc;
+ Dmsg0(100, "***MQ*** Issuing request sense...\n");
+ memset(&sense_data, 0, sizeof(struct request_sense));
+ rc = ioctl(m_fd, SIOC_REQSENSE, &sense_data);
+ if (rc == 0 && sense_data.err_code == 0x70 && sense_data.key == 0x08 &&
+ sense_data.asc == 0x00 && sense_data.ascq == 0x05)
+ {
+ Dmsg0(100, "***MQ*** Seems OK now...\n");
+ stat = 0;
+ }
+ else
+ {
+ Dmsg4(100, "***MQ*** sense error: err_code:%d, key:%d, asc:%d, ascq:%d\n",
+ sense_data.err_code, sense_data.key, sense_data.asc, sense_data.ascq);
+
+ berrno be;
+ set_eot();
+ clrerror(-1);
+ Dmsg2(100, "Set ST_EOT read errno=%d. ERR=%s\n", dev_errno,
+ be.bstrerror());
+ Mmsg2(errmsg, _("read error on %s. ERR=%s.\n"),
+ print_name(), be.bstrerror());
+ Dmsg1(100, "%s", errmsg);
+ break;
+ }
}
}
if (stat == 0) { /* EOF */
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel