Re: [Bacula-users] Questions regarding migration job failure

2011-05-13 Thread Jerry Lowry

thanks for your help and input.

I don't think the controller was/is causing the corruption.  The problem 
stems from my initial configuration of the storage and volumes causing 
the disks to fill up.  In order to hopefully not loose any backup data I 
moved some of the volumes to another disk while reconfiguring the pools 
and storage.  As I was working on the reconfiguration bacula got to the 
point where it wanted to write to the volumes that I moved, hence the 
volume was deemed corrupt because bacula could not find it.
So long as the recycling of the volumes clears the corrupt part of the 
volume I think I should be okay.  Will just have to be more intelligent 
in my configuration of volumes and storage.


Thanks

On 5/13/2011 2:25 AM, Martin Simmons wrote:

On Thu, 12 May 2011 09:58:14 -0700, Jerry Lowry said:

thanks for the help.  Looks like I have some digging to do to figure out
what is actually happening.  I know that I one time I had some problems
with the raid controller.  I have since gotten that resolved.

If the volume has been recycled will the corruption remain with the
volume or will it go by the wayside once the volume recycles?  Just
curious as to whether I should drop the corrupt volumes ( files ) and
create new ones.

I would consider reformatting the whole partition -- if the raid controller
was corrupting things, then there is no way to be sure that the filesystem is
OK.

__Martin

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-13 Thread Martin Simmons
> On Thu, 12 May 2011 09:58:14 -0700, Jerry Lowry said:
> 
> thanks for the help.  Looks like I have some digging to do to figure out 
> what is actually happening.  I know that I one time I had some problems 
> with the raid controller.  I have since gotten that resolved.
> 
> If the volume has been recycled will the corruption remain with the 
> volume or will it go by the wayside once the volume recycles?  Just 
> curious as to whether I should drop the corrupt volumes ( files ) and 
> create new ones.

I would consider reformatting the whole partition -- if the raid controller
was corrupting things, then there is no way to be sure that the filesystem is
OK.

__Martin

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-13 Thread Graham Keeling
On Thu, May 12, 2011 at 09:58:14AM -0700, Jerry Lowry wrote:
> thanks for the help.  Looks like I have some digging to do to figure out  
> what is actually happening.  I know that I one time I had some problems  
> with the raid controller.  I have since gotten that resolved.
>
> If the volume has been recycled will the corruption remain with the  
> volume or will it go by the wayside once the volume recycles?  Just  
> curious as to whether I should drop the corrupt volumes ( files ) and  
> create new ones.

The corruption will definitely remain with the volume if you don't recycle it.

Bacula truncates the volumes when it recycles them, which means that the area
of the disk on which the problem occurred is free to be used by anything.

So if the problem is to do with bad areas of disk, then it could hit you again
at any time. Therefore not truncating them could avoid the problem since the
bad space is contained in a volume that you are not going to use again.

But if the problem is because of bacula itself corrupting the volume, it could
happen again at any time anyway, so truncating them isn't going to make any
difference.

> On 5/12/2011 12:31 AM, Graham Keeling wrote:
>> On Wed, May 11, 2011 at 02:06:44PM -0700, Jerry Lowry wrote:
>>> another mistake on my part.  You have to give bls the correct spelling
>>> of the volume ( sometimes I wonder )
>>>
>>> Once I corrected the volume name this is the results I get:
>>>
>>> Volume Record: File:blk=0: 206 Sessid=16 SessTime=1303843290 Jobid=3
>>> DataLen=171
>>> 11-May 13:42 bls JobId 0: Error: block.c:318 Volumne data error at 0:206!
>>> Block checksum mismatch in block=6010112 len=64512: calc=c6a6912d
>>> blk=50a7d773
>> Well, that's the problem right there.
>> Your migration doesn't work when volumes that are not corrupted are being 
>> read.
>>
>> As to how your volumes got corrupted, that's a much harder question.
>>
>> If it were me, I would start everything from scratch, and after every backup
>> run your 'bls' command on any volume that changed. This will let you catch
>> the problem just after it happened, and you might be able to spot anything
>> strange that happened before that.
>>
>> (assuming that it is a bacula bug, rather than you having a disk or a file
>> system problem)
>>
>>> I ran this again with debug at level 200. I have attached the file with
>>> the output.
>>>
>>> thanks for all your help!
>>>
>>> On 5/11/2011 12:11 PM, Jerry Lowry wrote:
 Hi,

 No, the migration job is occurring on the same storage daemon.  This
 storage daemon has 6 raid devices setup as jbod, 3 are for daily use
 and 3 are setup as hotswap devices for off-site backups.  The problem
 is when I run bls on the storage daemon where the disks are located I
 get a message asking me to mount the disk, which is already mounted
 according to the director, as well as being mounted by the OS.



 On 5/11/2011 11:26 AM, Phil Stracchino wrote:
> On 05/11/11 13:48, Jerry Lowry wrote:
>> Ok, I am trying to run bls on the specified volume file that is
>> associated with this job. But the problem I am having is that bls is
>> failing trying to stat the device.
>>
>> I have one director and two storage directors.  The volume I am trying
>> to run against is on the second SD.  Do I run bls on the system where
>> the 'director' is or on the system thats running the stand alone 'sd'
>> where the volume is located?
> Jerry,
> If I'm understanding you correctly, you have two storage daemons, and
> you're trying to do a migration from a device on one SD to a device on
> the other.  Is this correct?
>
> If this understanding is correct, sorry, it won't work.  Copy and
> migration can currently only be done between devices controlled by the
> same SD.  (This is in large part a result of there being no current
> capability for direct communication between one storage daemon and 
> another.)
>
>
 -- 

 ---
 Jerold Lowry
 IT Manager / Software Engineer
 Engineering Design Team (EDT), Inc. a HEICO company
 1400 NW Compton Drive, Suite 315
 Beaverton, Oregon 97006 (U.S.A.)
 Phone: 503-690-1234 / 800-435-4320
 Fax: 503-690-1243
 Web: _www.edt.com_



 --
 Achieve unprecedented app performance and reliability
 What every C/C++ and Fortran developer should know.
 Learn how Intel has extended the reach of its next-generation tools
 to help boost performance applications - inlcuding clusters.
 http://p.sf.net/sfu/intel-dev2devmay


 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-use

Re: [Bacula-users] Questions regarding migration job failure

2011-05-12 Thread Jerry Lowry
thanks for the help.  Looks like I have some digging to do to figure out 
what is actually happening.  I know that I one time I had some problems 
with the raid controller.  I have since gotten that resolved.


If the volume has been recycled will the corruption remain with the 
volume or will it go by the wayside once the volume recycles?  Just 
curious as to whether I should drop the corrupt volumes ( files ) and 
create new ones.


On 5/12/2011 12:31 AM, Graham Keeling wrote:

On Wed, May 11, 2011 at 02:06:44PM -0700, Jerry Lowry wrote:

another mistake on my part.  You have to give bls the correct spelling
of the volume ( sometimes I wonder )

Once I corrected the volume name this is the results I get:

Volume Record: File:blk=0: 206 Sessid=16 SessTime=1303843290 Jobid=3
DataLen=171
11-May 13:42 bls JobId 0: Error: block.c:318 Volumne data error at 0:206!
Block checksum mismatch in block=6010112 len=64512: calc=c6a6912d
blk=50a7d773

Well, that's the problem right there.
Your migration doesn't work when volumes that are not corrupted are being read.

As to how your volumes got corrupted, that's a much harder question.

If it were me, I would start everything from scratch, and after every backup
run your 'bls' command on any volume that changed. This will let you catch
the problem just after it happened, and you might be able to spot anything
strange that happened before that.

(assuming that it is a bacula bug, rather than you having a disk or a file
system problem)


I ran this again with debug at level 200. I have attached the file with
the output.

thanks for all your help!

On 5/11/2011 12:11 PM, Jerry Lowry wrote:

Hi,

No, the migration job is occurring on the same storage daemon.  This
storage daemon has 6 raid devices setup as jbod, 3 are for daily use
and 3 are setup as hotswap devices for off-site backups.  The problem
is when I run bls on the storage daemon where the disks are located I
get a message asking me to mount the disk, which is already mounted
according to the director, as well as being mounted by the OS.



On 5/11/2011 11:26 AM, Phil Stracchino wrote:

On 05/11/11 13:48, Jerry Lowry wrote:

Ok, I am trying to run bls on the specified volume file that is
associated with this job. But the problem I am having is that bls is
failing trying to stat the device.

I have one director and two storage directors.  The volume I am trying
to run against is on the second SD.  Do I run bls on the system where
the 'director' is or on the system thats running the stand alone 'sd'
where the volume is located?

Jerry,
If I'm understanding you correctly, you have two storage daemons, and
you're trying to do a migration from a device on one SD to a device on
the other.  Is this correct?

If this understanding is correct, sorry, it won't work.  Copy and
migration can currently only be done between devices controlled by the
same SD.  (This is in large part a result of there being no current
capability for direct communication between one storage daemon and another.)



--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com_



--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com_


[jlowry@distress-sd bin]$ ./bls -d 200 -j -v -v -V home-0006 -c 
/etc/bacula/bacula-sd.conf /Home
bls: stored_conf.c:698-0 Inserting director res: distress-mon
bls: stored_conf.c:698-0 Inserting device res: DBB
bls: stored_conf.c:698-0 Inserting device res: Hardware
bls: stored_conf.c:698-0 Inserting device res: Swift
bls: stored_conf.c:698-0 Inserting device res: Home
bls: stored_conf.c:698-0 Inserting device res: Workstations
bls: stored_conf.c:698-0 Inserting device res: TopSwap
bls: stored_conf.c:698-0 Inserting device res: MidSwap
bls: stored_conf.c:698-0 Inserting device res: BottomSwap
bls: stored_conf.c:698-0 Inserting device res: FileStorage
bls: stored_conf.c:698-0 Inserting device res: FileStorage1
bls: stored_conf.c:698-0 Inserting device res: Drive-1
b

Re: [Bacula-users] Questions regarding migration job failure

2011-05-12 Thread Graham Keeling
On Wed, May 11, 2011 at 02:06:44PM -0700, Jerry Lowry wrote:
> another mistake on my part.  You have to give bls the correct spelling  
> of the volume ( sometimes I wonder )
>
> Once I corrected the volume name this is the results I get:
>
> Volume Record: File:blk=0: 206 Sessid=16 SessTime=1303843290 Jobid=3  
> DataLen=171
> 11-May 13:42 bls JobId 0: Error: block.c:318 Volumne data error at 0:206!
> Block checksum mismatch in block=6010112 len=64512: calc=c6a6912d  
> blk=50a7d773

Well, that's the problem right there.
Your migration doesn't work when volumes that are not corrupted are being read.

As to how your volumes got corrupted, that's a much harder question.

If it were me, I would start everything from scratch, and after every backup
run your 'bls' command on any volume that changed. This will let you catch
the problem just after it happened, and you might be able to spot anything
strange that happened before that.

(assuming that it is a bacula bug, rather than you having a disk or a file
system problem)

> I ran this again with debug at level 200. I have attached the file with  
> the output.
>
> thanks for all your help!
>
> On 5/11/2011 12:11 PM, Jerry Lowry wrote:
>> Hi,
>>
>> No, the migration job is occurring on the same storage daemon.  This  
>> storage daemon has 6 raid devices setup as jbod, 3 are for daily use  
>> and 3 are setup as hotswap devices for off-site backups.  The problem  
>> is when I run bls on the storage daemon where the disks are located I  
>> get a message asking me to mount the disk, which is already mounted  
>> according to the director, as well as being mounted by the OS.
>>
>>
>>
>> On 5/11/2011 11:26 AM, Phil Stracchino wrote:
>>> On 05/11/11 13:48, Jerry Lowry wrote:
 Ok, I am trying to run bls on the specified volume file that is
 associated with this job. But the problem I am having is that bls is
 failing trying to stat the device.

 I have one director and two storage directors.  The volume I am trying
 to run against is on the second SD.  Do I run bls on the system where
 the 'director' is or on the system thats running the stand alone 'sd'
 where the volume is located?
>>> Jerry,
>>> If I'm understanding you correctly, you have two storage daemons, and
>>> you're trying to do a migration from a device on one SD to a device on
>>> the other.  Is this correct?
>>>
>>> If this understanding is correct, sorry, it won't work.  Copy and
>>> migration can currently only be done between devices controlled by the
>>> same SD.  (This is in large part a result of there being no current
>>> capability for direct communication between one storage daemon and another.)
>>>
>>>
>>
>> -- 
>>
>> ---
>> Jerold Lowry
>> IT Manager / Software Engineer
>> Engineering Design Team (EDT), Inc. a HEICO company
>> 1400 NW Compton Drive, Suite 315
>> Beaverton, Oregon 97006 (U.S.A.)
>> Phone: 503-690-1234 / 800-435-4320
>> Fax: 503-690-1243
>> Web: _www.edt.com _
>>
>>
>>
>> --
>> Achieve unprecedented app performance and reliability
>> What every C/C++ and Fortran developer should know.
>> Learn how Intel has extended the reach of its next-generation tools
>> to help boost performance applications - inlcuding clusters.
>> http://p.sf.net/sfu/intel-dev2devmay
>>
>>
>> ___
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
> -- 
>
> ---
> Jerold Lowry
> IT Manager / Software Engineer
> Engineering Design Team (EDT), Inc. a HEICO company
> 1400 NW Compton Drive, Suite 315
> Beaverton, Oregon 97006 (U.S.A.)
> Phone: 503-690-1234 / 800-435-4320
> Fax: 503-690-1243
> Web: _www.edt.com _
>
>

> [jlowry@distress-sd bin]$ ./bls -d 200 -j -v -v -V home-0006 -c 
> /etc/bacula/bacula-sd.conf /Home
> bls: stored_conf.c:698-0 Inserting director res: distress-mon
> bls: stored_conf.c:698-0 Inserting device res: DBB
> bls: stored_conf.c:698-0 Inserting device res: Hardware
> bls: stored_conf.c:698-0 Inserting device res: Swift
> bls: stored_conf.c:698-0 Inserting device res: Home
> bls: stored_conf.c:698-0 Inserting device res: Workstations
> bls: stored_conf.c:698-0 Inserting device res: TopSwap
> bls: stored_conf.c:698-0 Inserting device res: MidSwap
> bls: stored_conf.c:698-0 Inserting device res: BottomSwap
> bls: stored_conf.c:698-0 Inserting device res: FileStorage
> bls: stored_conf.c:698-0 Inserting device res: FileStorage1
> bls: stored_conf.c:698-0 Inserting device res: Drive-1
> bls: match.c:250-0 add_fname_to_include prefix=0 gzip=0 fname=/
> bls: butil.c:281 Using device: "/Home" for reading.
> bls: dev.c:284-0 init_dev: tape=0 dev_name=/Home
> bls: vol_mgr.c:162-0 add r

Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Jerry Lowry
another mistake on my part.  You have to give bls the correct spelling 
of the volume ( sometimes I wonder )


Once I corrected the volume name this is the results I get:

Volume Record: File:blk=0: 206 Sessid=16 SessTime=1303843290 Jobid=3 
DataLen=171

11-May 13:42 bls JobId 0: Error: block.c:318 Volumne data error at 0:206!
Block checksum mismatch in block=6010112 len=64512: calc=c6a6912d 
blk=50a7d773


I ran this again with debug at level 200. I have attached the file with 
the output.


thanks for all your help!

On 5/11/2011 12:11 PM, Jerry Lowry wrote:

Hi,

No, the migration job is occurring on the same storage daemon.  This 
storage daemon has 6 raid devices setup as jbod, 3 are for daily use 
and 3 are setup as hotswap devices for off-site backups.  The problem 
is when I run bls on the storage daemon where the disks are located I 
get a message asking me to mount the disk, which is already mounted 
according to the director, as well as being mounted by the OS.




On 5/11/2011 11:26 AM, Phil Stracchino wrote:

On 05/11/11 13:48, Jerry Lowry wrote:

Ok, I am trying to run bls on the specified volume file that is
associated with this job. But the problem I am having is that bls is
failing trying to stat the device.

I have one director and two storage directors.  The volume I am trying
to run against is on the second SD.  Do I run bls on the system where
the 'director' is or on the system thats running the stand alone 'sd'
where the volume is located?

Jerry,
If I'm understanding you correctly, you have two storage daemons, and
you're trying to do a migration from a device on one SD to a device on
the other.  Is this correct?

If this understanding is correct, sorry, it won't work.  Copy and
migration can currently only be done between devices controlled by the
same SD.  (This is in large part a result of there being no current
capability for direct communication between one storage daemon and another.)




--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _



--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _


[jlowry@distress-sd bin]$ ./bls -d 200 -j -v -v -V home-0006 -c 
/etc/bacula/bacula-sd.conf /Home
bls: stored_conf.c:698-0 Inserting director res: distress-mon
bls: stored_conf.c:698-0 Inserting device res: DBB
bls: stored_conf.c:698-0 Inserting device res: Hardware
bls: stored_conf.c:698-0 Inserting device res: Swift
bls: stored_conf.c:698-0 Inserting device res: Home
bls: stored_conf.c:698-0 Inserting device res: Workstations
bls: stored_conf.c:698-0 Inserting device res: TopSwap
bls: stored_conf.c:698-0 Inserting device res: MidSwap
bls: stored_conf.c:698-0 Inserting device res: BottomSwap
bls: stored_conf.c:698-0 Inserting device res: FileStorage
bls: stored_conf.c:698-0 Inserting device res: FileStorage1
bls: stored_conf.c:698-0 Inserting device res: Drive-1
bls: match.c:250-0 add_fname_to_include prefix=0 gzip=0 fname=/
bls: butil.c:281 Using device: "/Home" for reading.
bls: dev.c:284-0 init_dev: tape=0 dev_name=/Home
bls: vol_mgr.c:162-0 add read_vol=home-0006 JobId=0
bls: butil.c:186-0 Acquire device for read
bls: acquire.c:95-0 Want Vol=home-0006 Slot=0
bls: acquire.c:109-0 MediaType dcr= dev=File
bls: acquire.c:189-0 dir_get_volume_info vol=home-0006
bls: bls.c:486-0 Fake dir_get_volume_info
bls: mount.c:546-0 Must load "Home" (/Home)
bls: autochanger.c:120-0 Device "Home" (/Home) is not an autochanger
bls: acquire.c:220-0 bstored: open vol=home-0006
bls: dev.c:360-0 open dev: type=1 dev_name="Home" (/Home) vol=home-0006 
mode=OPEN_READ_ONLY
bls: dev.c:369-0 call open_file_device mode=OPEN_READ_ONLY
bls: dev.c:2089-0 Enter mount
bls: dev.c:542-0 open disk: mode=OPEN_READ_ONLY open(/Home/home-0006, 0x0, 0640)
bls: dev.c:557-0 open dev: disk fd=3 opened, part=0/0, part_size=0
bls: dev.c:373-0 preserve=0x0 fd=3
bls: acquire.c:228-0 opened dev "Home" (/Home) OK
bls: acquire.c:231-0 calling read-vol-label
bls: l

Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Jerry Lowry

Hi,

No, the migration job is occurring on the same storage daemon.  This 
storage daemon has 6 raid devices setup as jbod, 3 are for daily use and 
3 are setup as hotswap devices for off-site backups.  The problem is 
when I run bls on the storage daemon where the disks are located I get a 
message asking me to mount the disk, which is already mounted according 
to the director, as well as being mounted by the OS.




On 5/11/2011 11:26 AM, Phil Stracchino wrote:

On 05/11/11 13:48, Jerry Lowry wrote:

Ok, I am trying to run bls on the specified volume file that is
associated with this job. But the problem I am having is that bls is
failing trying to stat the device.

I have one director and two storage directors.  The volume I am trying
to run against is on the second SD.  Do I run bls on the system where
the 'director' is or on the system thats running the stand alone 'sd'
where the volume is located?

Jerry,
If I'm understanding you correctly, you have two storage daemons, and
you're trying to do a migration from a device on one SD to a device on
the other.  Is this correct?

If this understanding is correct, sorry, it won't work.  Copy and
migration can currently only be done between devices controlled by the
same SD.  (This is in large part a result of there being no current
capability for direct communication between one storage daemon and another.)




--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Phil Stracchino
On 05/11/11 13:48, Jerry Lowry wrote:
> Ok, I am trying to run bls on the specified volume file that is
> associated with this job. But the problem I am having is that bls is
> failing trying to stat the device. 
> 
> I have one director and two storage directors.  The volume I am trying
> to run against is on the second SD.  Do I run bls on the system where
> the 'director' is or on the system thats running the stand alone 'sd'
> where the volume is located?

Jerry,
If I'm understanding you correctly, you have two storage daemons, and
you're trying to do a migration from a device on one SD to a device on
the other.  Is this correct?

If this understanding is correct, sorry, it won't work.  Copy and
migration can currently only be done between devices controlled by the
same SD.  (This is in large part a result of there being no current
capability for direct communication between one storage daemon and another.)


-- 
  Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
  ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
 It's not the years, it's the mileage.

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Jerry Lowry

Sorry, forgot to add this..
When I run bls on the second SD it asks me to mount the volume on the 
device specified. But when I go to the director and try to mount the 
device is says that it is always mounted due to the device being a disk.


bls -j -V Home-0006 /Home

this uses the bacula-sd.conf in the current directory.

Device {
   Name = Home
   Media Type = File
   Archive Device = /Home
   LabelMedia = yes;
   Random Access = yes;
   AutomaticMount = yes
   Removable Media = no;
   AlwaysOpen = no;
}



On 5/11/2011 10:48 AM, Jerry Lowry wrote:
Ok, I am trying to run bls on the specified volume file that is 
associated with this job. But the problem I am having is that bls is 
failing trying to stat the device.


I have one director and two storage directors.  The volume I am trying 
to run against is on the second SD.  Do I run bls on the system where 
the 'director' is or on the system thats running the stand alone 'sd' 
where the volume is located?


thanks

On 5/11/2011 9:32 AM, Graham Keeling wrote:

On Wed, May 11, 2011 at 09:19:49AM -0700, Jerry Lowry wrote:

I have not tried to restore from that particular job as yet, but the
next question would be, if it fails on the restore that would mean that
anything backed up in that job would not be valid, correct?

I think that depends upon what you mean by valid.

For example, it might be possible to skip over the bad area of the volume and
restore some files past that bad area.

If it were me, I have to say that I would indeed be treating the whole job as
suspicious. And the others too, probably.

But let's not get ahead of ourselves. Perhaps the volume is actually fine and
the problem is something else.

Rather than doing a restore, maybe it would be worth running commands like
'bls' on the volume first. It would probably give a quicker diagnosis, if
there is a problem.


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _



--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Jerry Lowry
Ok, I am trying to run bls on the specified volume file that is 
associated with this job. But the problem I am having is that bls is 
failing trying to stat the device.


I have one director and two storage directors.  The volume I am trying 
to run against is on the second SD.  Do I run bls on the system where 
the 'director' is or on the system thats running the stand alone 'sd' 
where the volume is located?


thanks

On 5/11/2011 9:32 AM, Graham Keeling wrote:

On Wed, May 11, 2011 at 09:19:49AM -0700, Jerry Lowry wrote:

I have not tried to restore from that particular job as yet, but the
next question would be, if it fails on the restore that would mean that
anything backed up in that job would not be valid, correct?

I think that depends upon what you mean by valid.

For example, it might be possible to skip over the bad area of the volume and
restore some files past that bad area.

If it were me, I have to say that I would indeed be treating the whole job as
suspicious. And the others too, probably.

But let's not get ahead of ourselves. Perhaps the volume is actually fine and
the problem is something else.

Rather than doing a restore, maybe it would be worth running commands like
'bls' on the volume first. It would probably give a quicker diagnosis, if
there is a problem.


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Graham Keeling
On Wed, May 11, 2011 at 09:19:49AM -0700, Jerry Lowry wrote:
> I have not tried to restore from that particular job as yet, but the  
> next question would be, if it fails on the restore that would mean that  
> anything backed up in that job would not be valid, correct?

I think that depends upon what you mean by valid.

For example, it might be possible to skip over the bad area of the volume and
restore some files past that bad area.

If it were me, I have to say that I would indeed be treating the whole job as
suspicious. And the others too, probably.

But let's not get ahead of ourselves. Perhaps the volume is actually fine and
the problem is something else.

Rather than doing a restore, maybe it would be worth running commands like
'bls' on the volume first. It would probably give a quicker diagnosis, if
there is a problem.


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Jerry Lowry
I have not tried to restore from that particular job as yet, but the 
next question would be, if it fails on the restore that would mean that 
anything backed up in that job would not be valid, correct?


thanks

On 5/11/2011 8:54 AM, Graham Keeling wrote:

On Wed, May 11, 2011 at 08:44:18AM -0700, Jerry Lowry wrote:

Is there anyone that can help me with this problem?  Surely someone is
using the migration job.

I'm not using migration jobs, but maybe I can give you a hint...


On 5/9/2011 2:51 PM, jerry lowry wrote:

09-May 13:59 distress-sd-sd JobId 2549: Forward spacing Volume "hardware-0007" 
tofile:block  0:215.
09-May 13:59 distress-sd-sd JobId 2549: Error: block.c:275 Volume data error at 0:215! Wanted ID: 
"BB02", got "2". Buffer discarded.

It seems to me that the error is not with the write to the new volume, but with
the read from the existing volume "hardware-0007".

I've seen similar errors before, when I found bugs in bacula that trashed the
data on my disk volumes.

One thing to try is a restore from "hardware-0007". I predict that you will
get the same error.


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com _


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Graham Keeling
On Wed, May 11, 2011 at 08:44:18AM -0700, Jerry Lowry wrote:
> Is there anyone that can help me with this problem?  Surely someone is  
> using the migration job.

I'm not using migration jobs, but maybe I can give you a hint...

> On 5/9/2011 2:51 PM, jerry lowry wrote:
>> 09-May 13:59 distress-sd-sd JobId 2549: Forward spacing Volume 
>> "hardware-0007" tofile:block  0:215.
>> 09-May 13:59 distress-sd-sd JobId 2549: Error: block.c:275 Volume data error 
>> at 0:215! Wanted ID: "BB02", got "2". Buffer discarded.

It seems to me that the error is not with the write to the new volume, but with
the read from the existing volume "hardware-0007".

I've seen similar errors before, when I found bugs in bacula that trashed the
data on my disk volumes.

One thing to try is a restore from "hardware-0007". I predict that you will
get the same error.


--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Questions regarding migration job failure

2011-05-11 Thread Jerry Lowry
Is there anyone that can help me with this problem?  Surely someone is 
using the migration job.



On 5/9/2011 2:51 PM, jerry lowry wrote:

Hi,

I am frequently getting errors on my migration jobs and I need some 
help trying to figure out what the problem is.


I have three migration jobs that migrate data from a daily disk to a 
raid disk that is setup as a hotswap disk.  Once this is full I pull 
the disk and move it to an offsite facility.  About half of the time 
the migration jobs work with out any problems, the other half I get 
errors on many of the jobs that are being migrated.  Example:  I start 
a migrate job and it starts to migrate 6 jobs to the offsite disk.  It 
will get through two of the jobs successfully and then the last four 
jobs will fail with the error below.  Each of the media are created 
using  BAT or BConsole without errors.


I have no clue as to what the problem might be, so any help is great.

Below you will find the config files and job output.

thanks,
jerry

Job error:
09-May 12:55 distress-dir JobId 2549: The following 3 JobIds were chosen to be 
migrated: 2335,2328,2291
09-May 12:55 distress-dir JobId 2549: Job queued. JobId=2550
09-May 12:55 distress-dir JobId 2549: Migration JobId 2550 started.
09-May 12:55 distress-dir JobId 2549: Job queued. JobId=2552
09-May 12:55 distress-dir JobId 2549: Migration JobId 2552 started.
09-May 12:55 distress-dir JobId 2549: Migration using JobId=2291 
Job=BackupHardware.2011-04-17_20.05.00_17
09-May 12:55 distress-dir JobId 2549: Bootstrap records written to 
/var/run/bacula/working/distress-dir.restore.53.bsr
09-May 13:59 distress-dir JobId 2549: Start Migration JobId 2549, 
Job=CopyHWDiskToDisk.2011-05-09_12.55.37_45
09-May 13:59 distress-dir JobId 2549: Using Device "TopSwap"
09-May 13:59 distress-sd-sd JobId 2549: Ready to read from volume "hardware-0007" on 
device "Hardware" (/Hardware).
09-May 13:59 distress-sd-sd JobId 2549: Volume "hardwareBS-2" previously 
written, moving to end of data.
09-May 13:59 distress-sd-sd JobId 2549: Ready to append to end of Volume 
"hardwareBS-2" size=240021666918
09-May 13:59 distress-sd-sd JobId 2549: Forward spacing Volume "hardware-0007" 
tofile:block  0:215.
09-May 13:59 distress-sd-sd JobId 2549: Error: block.c:275 Volume data error at 0:215! Wanted ID: 
"BB02", got "2". Buffer discarded.
09-May 13:59 distress-dir JobId 2549: Error: Bacula distress-dir 5.0.1 
(24Feb10): 09-May-2011 13:59:15
   Build OS:   x86_64-unknown-linux-gnu redhat
   Prev Backup JobId:  2291
   Prev Backup Job:BackupHardware.2011-04-17_20.05.00_17
   New Backup JobId:   2554
   Current JobId:  2549
   Current Job:CopyHWDiskToDisk.2011-05-09_12.55.37_45
   Backup Level:   Full
   Client: distress-sd-fd
   FileSet:"Top Set" 2011-03-30 10:42:47
   Read Pool:  "HardwarePool" (From Job resource)
   Read Storage:   "hardware" (From command line)
   Write Pool: "OffsiteTop" (From Job Pool's NextPool resource)
   Write Storage:  "topswap" (From Storage from Pool's NextPool 
resource)
   Catalog:"MyCatalog" (From Client resource)
   Start time: 09-May-2011 13:59:15
   End time:   09-May-2011 13:59:15
   Elapsed time:   0 secs
   Priority:   10
   SD Files Written:   0
   SD Bytes Written:   0 (0 B)
   Rate:   0.0 KB/s
   Volume name(s):
   Volume Session Id:  27
   Volume Session Time:1304722130
   Last Volume Bytes:  0 (0 B)
   SD Errors:  1
   SD termination status:  Running
   Termination:*** Migration Error ***


Configuration files: (This is one of three, they are all setup the same way)

Job {
 Name = "CopyHWDiskToDisk"
 Type = Migrate
 Level = Full
 FileSet = "Top Set"
 Client = distress-sd-fd
 Messages = Standard
Storage = hardware
 Pool = HardwarePool
 Maximum Concurrent Jobs = 4
 Selection Type = Pool Time
 Selection Pattern = "hardwareTS-*"
}

# File Pool definition
Pool {
   Name = OffsiteTop
   Pool Type = Migrate
   Next Pool = OffsiteTop
   Storage = topswap
   Recycle = yes   # Bacula can automatically recycle 
Volumes
   AutoPrune = yes # Prune expired volumes
   Volume Retention = 6 months # one week
   Maximum Volume Bytes = 1800G   # Limit Volume size to something 
reasonable
   Maximum Volumes = 10   # Limit number of Volumes in Pool
}

FileSet {
Name = "Top Set"
Include {
Options {
signature = MD5
}
#
#  Put your list of files here, preceded by 'File =', one per line
#or include an external list with:
#
#File =http://p.sf.net/sfu/whatsupgold-sd


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://list

[Bacula-users] Questions regarding migration job failure

2011-05-09 Thread jerry lowry

Hi,

I am frequently getting errors on my migration jobs and I need some help 
trying to figure out what the problem is.


I have three migration jobs that migrate data from a daily disk to a 
raid disk that is setup as a hotswap disk.  Once this is full I pull the 
disk and move it to an offsite facility.  About half of the time the 
migration jobs work with out any problems, the other half I get errors 
on many of the jobs that are being migrated.  Example:  I start a 
migrate job and it starts to migrate 6 jobs to the offsite disk.  It 
will get through two of the jobs successfully and then the last four 
jobs will fail with the error below.  Each of the media are created 
using  BAT or BConsole without errors.


I have no clue as to what the problem might be, so any help is great.

Below you will find the config files and job output.

thanks,
jerry

Job error:

09-May 12:55 distress-dir JobId 2549: The following 3 JobIds were chosen to be 
migrated: 2335,2328,2291
09-May 12:55 distress-dir JobId 2549: Job queued. JobId=2550
09-May 12:55 distress-dir JobId 2549: Migration JobId 2550 started.
09-May 12:55 distress-dir JobId 2549: Job queued. JobId=2552
09-May 12:55 distress-dir JobId 2549: Migration JobId 2552 started.
09-May 12:55 distress-dir JobId 2549: Migration using JobId=2291 
Job=BackupHardware.2011-04-17_20.05.00_17
09-May 12:55 distress-dir JobId 2549: Bootstrap records written to 
/var/run/bacula/working/distress-dir.restore.53.bsr
09-May 13:59 distress-dir JobId 2549: Start Migration JobId 2549, 
Job=CopyHWDiskToDisk.2011-05-09_12.55.37_45
09-May 13:59 distress-dir JobId 2549: Using Device "TopSwap"
09-May 13:59 distress-sd-sd JobId 2549: Ready to read from volume "hardware-0007" on 
device "Hardware" (/Hardware).
09-May 13:59 distress-sd-sd JobId 2549: Volume "hardwareBS-2" previously 
written, moving to end of data.
09-May 13:59 distress-sd-sd JobId 2549: Ready to append to end of Volume 
"hardwareBS-2" size=240021666918
09-May 13:59 distress-sd-sd JobId 2549: Forward spacing Volume "hardware-0007" 
tofile:block  0:215.
09-May 13:59 distress-sd-sd JobId 2549: Error: block.c:275 Volume data error at 0:215! Wanted ID: 
"BB02", got "2". Buffer discarded.
09-May 13:59 distress-dir JobId 2549: Error: Bacula distress-dir 5.0.1 
(24Feb10): 09-May-2011 13:59:15
  Build OS:   x86_64-unknown-linux-gnu redhat
  Prev Backup JobId:  2291
  Prev Backup Job:BackupHardware.2011-04-17_20.05.00_17
  New Backup JobId:   2554
  Current JobId:  2549
  Current Job:CopyHWDiskToDisk.2011-05-09_12.55.37_45
  Backup Level:   Full
  Client: distress-sd-fd
  FileSet:"Top Set" 2011-03-30 10:42:47
  Read Pool:  "HardwarePool" (From Job resource)
  Read Storage:   "hardware" (From command line)
  Write Pool: "OffsiteTop" (From Job Pool's NextPool resource)
  Write Storage:  "topswap" (From Storage from Pool's NextPool resource)
  Catalog:"MyCatalog" (From Client resource)
  Start time: 09-May-2011 13:59:15
  End time:   09-May-2011 13:59:15
  Elapsed time:   0 secs
  Priority:   10
  SD Files Written:   0
  SD Bytes Written:   0 (0 B)
  Rate:   0.0 KB/s
  Volume name(s):
  Volume Session Id:  27
  Volume Session Time:1304722130
  Last Volume Bytes:  0 (0 B)
  SD Errors:  1
  SD termination status:  Running
  Termination:*** Migration Error ***


Configuration files: (This is one of three, they are all setup the same way)

Job {
Name = "CopyHWDiskToDisk"
Type = Migrate
Level = Full
FileSet = "Top Set"
Client = distress-sd-fd
Messages = Standard
Storage = hardware
Pool = HardwarePool
Maximum Concurrent Jobs = 4
Selection Type = Pool Time
Selection Pattern = "hardwareTS-*"
}

# File Pool definition
Pool {
  Name = OffsiteTop
  Pool Type = Migrate
  Next Pool = OffsiteTop
  Storage = topswap
  Recycle = yes   # Bacula can automatically recycle Volumes
  AutoPrune = yes # Prune expired volumes
  Volume Retention = 6 months # one week
  Maximum Volume Bytes = 1800G   # Limit Volume size to something reasonable
  Maximum Volumes = 10   # Limit number of Volumes in Pool
}

FileSet {
Name = "Top Set"
Include {
Options {
signature = MD5
}
#
#  Put your list of files here, preceded by 'File =', one per line
#or include an external list with:
#
#File =--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsu