Re: [Bacula-users] Device

2011-07-10 Thread Mike Hobbs
On 7/10/2011 11:14 AM, Mike Hobbs wrote:
> I've had concurrent backups working fine for the past few days, this
> morning I login to check my server and I noticed that bacula/vchanger is
> only using 2 of my 4 virtual drives.  I don't understand why, could you
> point me in the right direction to find out why this is?

Very strange.  Most of today my server was using only 2 of 4 virtual 
drives.  Now, all 4 drives are back to being in use. :-\

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block todevice"LTO4"

2011-07-10 Thread Steve Costaras

-Original Message-

>I suggest running smaller jobs. I don't mean to sound trite, but thatreally is 
>the solution. Given that the alternative is non-trivial, thesensible choice 
>is, I'm afraid, cancel the job.

I'm already kicking off 20+ jobs for a single system already. This does not 
work when we're talking over the 100TB/nearly 200TB mark. And when these errors 
happen it does not matter how many jobs you have as /all/ outstanding jobs fail 
when you have concurancy (in this case all jobs that were qued and were not 
even writing to the same tape were canceled). > This sounds like a 
configuration issue. Queued jobs should not be cancelled when a previous job 
cancels.

Not queued, concurent jobs (all are active at the same time but only one writes 
at a time from it's spool file) This was done to avoid the 
write|spool|write|spool loop for a serial job against a large system cutting 
backup times in half.






--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device"LTO4"

2011-07-10 Thread Dan Langille
Resending, with additional information.

On Jul 10, 2011, at 3:18 PM, Steve Costaras wrote:

>  
> -Original Message-
> From: Dan Langille [mailto:d...@langille.org]
> Sent: Sunday, July 10, 2011 12:58 PM
> To: stev...@chaven.com
> Cc: bacula-users@lists.sourceforge.net
> Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block 
> to device "LTO4"
> 
> >> 
> >> 2) since everything is spooled first, there should be NO error that should 
> >> cancel a job. A tape drive could fail, a tape could burst into flame, all 
> >> that would be needed was bacula to know that >>there was an issue and give 
> >> the admin a simple statement do you want to fix the issue or cancel?, the 
> >> admin to fix the problem, and then bacula told to restart from the last 
> >> block that was >>stored successfully OR if need be from the beginning of 
> >> the spooled data file.
> 
> >This I do know. Although, at first glance it seems easy to do this, it is 
> >not. If it was trivial to do, I assure you, it would already be in place.
> 
> >> Canceling jobs that run for days for TB's of data is just screwed up.
> 
> >I suggest running smaller jobs. I don't mean to sound trite, but that really 
> >is the solution. Given that the alternative is non-trivial, the sensible 
> >choice is, I'm afraid, cancel the job.
> 
> I'm already kicking off 20+ jobs for a single system already.   This does not 
> work when we're talking over the 100TB/nearly 200TB mark. And when these 
> errors happen it does not matter how many jobs you have as /all/ outstanding 
> jobs fail when you have concurancy (in this case all jobs that were qued and 
> were not even writing to the same tape were canceled).  
This sounds like a configuration issue.  Queued jobs should not be cancelled 
when a previous job cancels.  FYI, I've never seen this happen on my systems.  
I think this is something you need to follow up on

> This does not happen with any other enterprise backup software not that they 
> should be 100% mimicked.
> With the data sizes we have today I don't see why there are not better error 
> handling checks/routines.


This is open source software.  Stuff gets written because someone wants it.  
Clearly, nobody who wants it has written. That is why it does not exist.

But sorry, that's not helping you find a solution.  James Harper has some good 
points. :)  I hope it leads somewhere.

-- 
Dan Langille - http://langille.org

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device"LTO4"

2011-07-10 Thread Dan Langille

On Jul 10, 2011, at 3:18 PM, Steve Costaras wrote:

>  
> -Original Message-
> From: Dan Langille [mailto:d...@langille.org]
> Sent: Sunday, July 10, 2011 12:58 PM
> To: stev...@chaven.com
> Cc: bacula-users@lists.sourceforge.net
> Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block 
> to device "LTO4"
> 
> >> 
> >> 2) since everything is spooled first, there should be NO error that should 
> >> cancel a job. A tape drive could fail, a tape could burst into flame, all 
> >> that would be needed was bacula to know that >>there was an issue and give 
> >> the admin a simple statement do you want to fix the issue or cancel?, the 
> >> admin to fix the problem, and then bacula told to restart from the last 
> >> block that was >>stored successfully OR if need be from the beginning of 
> >> the spooled data file.
> 
> >This I do know. Although, at first glance it seems easy to do this, it is 
> >not. If it was trivial to do, I assure you, it would already be in place.
> 
> >> Canceling jobs that run for days for TB's of data is just screwed up.
> 
> >I suggest running smaller jobs. I don't mean to sound trite, but that really 
> >is the solution. Given that the alternative is non-trivial, the sensible 
> >choice is, I'm afraid, cancel the job.
> 
> I'm already kicking off 20+ jobs for a single system already.   This does not 
> work when we're talking over the 100TB/nearly 200TB mark. And when these 
> errors happen it does not matter how many jobs you have as /all/ outstanding 
> jobs fail when you have concurancy (in this case all jobs that were qued and 
> were not even writing to the same tape were canceled).  
This sounds like a configuration issue.  Queued jobs should not be cancelled 
when a previous job cancels.

> This does not happen with any other enterprise backup software not that they 
> should be 100% mimicked.
> With the data sizes we have today I don't see why there are not better error 
> handling checks/routines.


This is open source software.  Stuff gets written because someone wants it.  
Clearly, nobody who wants it has written. That is why it does not exist.

-- 
Dan Langille - http://langille.org

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

2011-07-10 Thread Steve Costaras


> Just had a quick look... the "read-only" message is this in stored/block.c:
>
> if (!dev->can_append()) {
> dev->dev_errno = EIO;
> Jmsg1(jcr, M_FATAL, 0, _("Attempt to write on read-only Volume. dev=%s\n"), 
> dev->print_name());
> return false;
> }
>
>And can_append() is:
>
>int can_append() const { return state & ST_APPEND; }
>
>so it does seem pretty basic unless there is a race somewhere in getting the 
>value of 'state'.
>
>Are there any kernel messages that might indicate a problem somewhere at that 
>time?


Nothing related to bacula/tape modules.   I am running zfsonlinux for the file 
system here and there is a known bug with that causing soft lockups for 60-120 
seconds:  

[121423.079640] BUG: soft lockup - CPU#5 stuck for 61s! [z_wr_iss/5:5354]

Though the system recovers.  This normally happens at delete time (txg_sync) 
which as this was a new tape mount that would/could be close to the time when 
an old spool was being deleted (spool sizes are 800G which is the same size as 
the LTO4 tape).  

Though I did not see anything like that happen at the time, when it normally 
happens there is a complete system 'freeze' for a couple seconds and then 
recovery, I was in via ssh and did not see that and was able to umount & run 
btape commands.







--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

2011-07-10 Thread James Harper
> 
> no idea, if we can find out what triggered the original message. Without
> doing anything physical, I did an umount storage=LTO4 from bacula and then
> went and did a full btape rawfill without a single problem on the volume:
> 
> *status
>  Bacula status: file=0 block=1
>  Device status: ONLINE IM_REP_EN file=0 block=1
> btape: btape.c:2133 Device status: 641. ERR=
> *rewind
> btape: btape.c:578 Rewound "LTO4" (/dev/nst0)
> *rawfill
> btape: btape.c:2847 Begin writing raw blocks of 2097152 bytes.
> +++ (...)
> Write failed at block 384701. stat=-1 ERR=No space left on device
> btape: btape.c:410 Volume bytes=806.7 GB. Write rate = 106.1 MB/s
> btape: btape.c:608 Wrote 1 EOF to "LTO4" (/dev/nst0)
> *
> 
> zero problems at all.
> 

Just had a quick look... the "read-only" message is this in stored/block.c:

   if (!dev->can_append()) {
  dev->dev_errno = EIO;
  Jmsg1(jcr, M_FATAL, 0, _("Attempt to write on read-only Volume. 
dev=%s\n"), dev->print_name());
  return false;
   }

And can_append() is:

int can_append() const { return state & ST_APPEND; }

so it does seem pretty basic unless there is a race somewhere in getting the 
value of 'state'.

Are there any kernel messages that might indicate a problem somewhere at that 
time?

James
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

2011-07-10 Thread Steve Costaras

no idea, if we can find out what triggered the original message. Without doing 
anything physical, I did an umount storage=LTO4 from bacula and then went and 
did a full btape rawfill without a single problem on the volume:

*status
 Bacula status: file=0 block=1
 Device status: ONLINE IM_REP_EN file=0 block=1
btape: btape.c:2133 Device status: 641. ERR=
*rewind
btape: btape.c:578 Rewound "LTO4" (/dev/nst0)
*rawfill
btape: btape.c:2847 Begin writing raw blocks of 2097152 bytes.
+++ (...)
Write failed at block 384701. stat=-1 ERR=No space left on device
btape: btape.c:410 Volume bytes=806.7 GB. Write rate = 106.1 MB/s
btape: btape.c:608 Wrote 1 EOF to "LTO4" (/dev/nst0)
*

zero problems at all.




-Original Message-
From: James Harper [mailto:james.har...@bendigoit.com.au]
Sent: Sunday, July 10, 2011 06:42 PM
To: stev...@chaven.com, bacula-users@lists.sourceforge.net
Subject: RE: [Bacula-users] Catastrophic error. Cannot write overflow block to 
device "LTO4"

> 
> 3000 OK label. VolBytes=1024 DVD=0 Volume="FA0016" Device="LTO4" (/dev/nst0)
> Requesting to mount LTO4 ...
> 3905 Bizarre wait state 7
> Do not forget to mount the drive!!!
> 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume "FA0016" on
> device "LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 6: New volume "FA0016" mounted on device "LTO4"
> (/dev/nst0) at 10-Jul-2011 03:51.
> 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on
> read-only Volume. dev="LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 6: End of medium on Volume "FA0016" Bytes=1,024
> Blocks=0 at 10-Jul-2011 03:51.

This probably isn't helpful, but why does Bacula think that the volume is 
read-only?

James

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Setting Priority

2011-07-10 Thread Ben Walton
Excerpts from Mike Hobbs's message of Fri Jul 08 15:23:47 -0400 2011:

Hi Mike,

> Due to the amount of data and machines I'll be backing up, I can see
> that when the first of the month comes around, running all my level
> 0 (Full) backups is going to take a week or more.  I'm concerned
> because if my level 1s (incremental) get queued up behind the level
> 0s data could potentially be lost.  So, I would like to assign my
> incremental a higher priority over Fulls, that way when the server
> is busy running Full backups, bacula will queue up the level 1's
> first and then continue on with the level 0s.  Am I making sense?

I don't think it works like this.  Depending on the number of
concurrent jobs you can run, the level 0's will queue up before the
level 1's are scheduled.  When the level 1's come around the next day,
they'd be placed in queue after the existing jobs.  (This is my
understanding of things.)

> Throughout my testing, I have only used one JobDef, the default one.
> I have edited it a little.  Because all my jobs use this JobsDef all
> the jobs, whether a Full or Incremental, gets assigned the default
> Priority 10.  I'm confused as to how I will configure bacula to have
> the Full backups say priority 10 and the incremental, say a Priority
> 9.

You can override priority for each job that uses the jobdefs on a
job-by-job basis, but you'll also need to make multiple jobs per
client (one for full, one for daily/differential) so that you can
assign different priorities.  Then, you'd need separate schedules to
run the different jobs at the approriate time.

> Another question, sort of related.  As I said above, at the first of
> the month when all my Full backups fire off, it's going to keep the
> backup server very busy for a week or more.  Is there a way to
> configure bacula to run certain groups (Pools?) Full backups on
> different dates?  Say, I could have 1/2 my machines level 0 run on
> the 1st of each month and then the second 1/2 run on the 15th?  Or
> something like that.  I'd really like to have my Full backups done
> within a few days and not weeks.

You'd need to use different schedules for this.  I'm planning to setup
schedules here using something like FridayFull, SaturdayFull and
SundayFull and then splay my machines across those manually...

I hope this helps.  It's a bit of a thought exercise for me as I'm
also new to bacula and considering some of the same things you are.  I
hope one of the long-time users will correct any of the above if it's
not correct or subtly wrong.  (I sat on this reply hoping that
somebody more knowledgeable would jump in, but since nobody has, I'm
taking a crack at it.)

Thanks
-Ben
--
Ben Walton
Systems Programmer - CHASS
University of Toronto
C:416.407.5610 | W:416.978.4302


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

2011-07-10 Thread James Harper
> 
> 3000 OK label. VolBytes=1024 DVD=0 Volume="FA0016" Device="LTO4" (/dev/nst0)
> Requesting to mount LTO4 ...
> 3905 Bizarre wait state 7
> Do not forget to mount the drive!!!
> 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume "FA0016" on
> device "LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 6: New volume "FA0016" mounted on device "LTO4"
> (/dev/nst0) at 10-Jul-2011 03:51.
> 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on
> read-only Volume. dev="LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 6: End of medium on Volume "FA0016" Bytes=1,024
> Blocks=0 at 10-Jul-2011 03:51.

This probably isn't helpful, but why does Bacula think that the volume is 
read-only?

James

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Bacula Client for Mac OS X Snow Leopard

2011-07-10 Thread tscollins
Daniel,

You need to verify that you have the proper SDKs. To do this look in the 
/Developer/SDKs/ directory and you should see MacOSX10.4u.sdk, MacOSX10.5.sdk 
and MacOSX10.6.sdk if you don't see all three SDKs then grab a Snow Leopard DVD 
and open the 'Optional Installs' folder and double click on both the Xcode.mpkg 
and the 'Optional Installs.mpkg' files. Once you have both of those installed 
you should have no errors when running configure and make.

TSC

+--
|This was sent by tscoll...@languagemate.com via Backup Central.
|Forward SPAM to ab...@backupcentral.com.
+--



--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Mac OS X client fails to backup

2011-07-10 Thread tscollins
I'm new to Bacula and am having trouble getting backups of two Mac OS X clients 
to work. I think the problem is with my configuration and hope someone can 
correct my errors. Here are the conf files:

Mac1 bacula-fd.conf
Director {
  Name = dracula-dir
  Password = "blah"
}
Director {
  Name = dracula-mon
  Password = "blah"
  Monitor = yes
}
FileDaemon {
  Name = Mac1-fd
  FDport = 9102
  WorkingDirectory = /private/var/bacula/working
  Pid Directory = /var/run
  Maximum Concurrent Jobs = 20
}
Messages {
  Name = Standard
  director = dracula-dir = all, !skipped, !restored
}

Mac2 bacula-fd.conf
Director {
  Name = dracula-dir
  Password = "blah"
}
Director {
  Name = dracula-mon
  Password = "blah"
  Monitor = yes
}
FileDaemon {
  Name = Mac2
  FDport = 9102
  WorkingDirectory = /private/var/bacula/working
  Pid Directory = /var/run
  Maximum Concurrent Jobs = 20
}
Messages {
  Name = Standard
  director = Mac2-dir = all, !skipped, !restored
}

Server bacula-dir.conf
Director {
  Name = dracula-dir
  DIRport = 9101
  QueryFile = "/root/bacula/bin/query.sql"
  WorkingDirectory = "/root/bacula/bin/working"
  PidDirectory = "/root/bacula/bin/working"
  Maximum Concurrent Jobs = 1
  Password = "blah"
  Messages = Daemon
}
JobDefs {
  Name = "DefaultJob"
  Type = Backup
  Level = Incremental
  Client = dracula-fd 
  FileSet = "Full Set"
  Schedule = "WeeklyCycle"
  Storage = File
  Messages = Standard
  Pool = File
  Priority = 10
  Write Bootstrap = "/root/bacula/bin/working/%c.bsr"
}
Job {
  Name = "BackupClient1"
  JobDefs = "DefaultJob"
}
Job {
   Name = "Mac1-test"
   Client = Mac1-fd
   JobDefs = "DefaultJob"
   FileSet = "Mac1_test"
}
Job {
   Name = "Mac2-test"
   Client = Mac2-fd
   JobDefs = "DefaultJob"
   FileSet = "Mac2_test"
}
Job {
  Name = "BackupCatalog"
  JobDefs = "DefaultJob"
  Level = Full
  FileSet="Catalog"
  Schedule = "WeeklyCycleAfterBackup"
  RunBeforeJob = "/root/bacula/bin/make_catalog_backup.pl MyCatalog"
  RunAfterJob  = "/root/bacula/bin/delete_catalog_backup"
  Write Bootstrap = "/root/bacula/bin/working/%n.bsr"
  Priority = 11   # run after main backup
}
Job {
  Name = "RestoreFiles"
  Type = Restore
  Client=dracula-fd 
  FileSet="Full Set"  
  Storage = File  
  Pool = Default
  Messages = Standard
  Where = /tmp/bacula-restores
}
FileSet {
  Name = "Full Set"
  Include {
Options {
  signature = MD5
}
File = /root/bacula/bin
  }
  Exclude {
File = /root/bacula/bin/working
File = /tmp
File = /proc
File = /tmp
File = /.journal
File = /.fsck
  }
}
FileSet {
  Name = "Mac1_test"
  Include {
Options {
  hfsplussupport = yes
}
File = /Users/tscollins/Download
  }
}
FileSet {
  Name = "Mac2_test"
  Include {
Options {
  hfsplussupport = yes
}
File = /Users/Technology
  }
}
Schedule {
  Name = "WeeklyCycle"
  Run = Full 1st sun at 23:05
  Run = Differential 2nd-5th sun at 23:05
  Run = Incremental mon-sat at 23:05
}
Schedule {
  Name = "WeeklyCycleAfterBackup"
  Run = Full sun-sat at 23:10
}
FileSet {
  Name = "Catalog"
  Include {
Options {
  signature = MD5
}
File = "/root/bacula/bin/working/bacula.sql"
  }
}
Client {
  Name = dracula-fd
  Address = dracula
  FDPort = 9102
  Catalog = MyCatalog
  Password = "blah"
  File Retention = 30 days
  Job Retention = 6 months
  AutoPrune = yes
}
Client {
  Name = Mac1-fd
  Address = Mac1
  FDPort = 9102
  Catalog = MyCatalog
  Password = "blah"
  File Retention = 30 days
  Job Retention = 6 months
  AutoPrune = yes
}
Client {
  Name = Mac2-fd
  Address = Mac2
  FDPort = 9102
  Catalog = MyCatalog
  Password = "blah"
  File Retention = 30 days
  Job Retention = 6 months
  AutoPrune = yes
}
Storage {
  Name = File
  Address = dracula
  SDPort = 9103
  Password = "blah"
  Device = FileStorage
  Media Type = File
}
Catalog {
  Name = MyCatalog
  dbname = "bacula"; dbuser = "bacula"; dbpassword = ""
}
Messages {
  Name = Standard
  mailcommand = "/root/bacula/bin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" 
-s \"Bacula: %t %e of %c %l\" %r"
  operatorcommand = "/root/bacula/bin/bsmtp -h localhost -f \"\(Bacula\) 
\<%r\>\" -s \"Bacula: Intervention needed for %j\" %r"
  mail = root@localhost = all, !skipped
  operator = root@localhost = mount
  console = all, !skipped, !saved
  append = "/root/bacula/bin/working/log" = all, !skipped
  catalog = all
}
Messages {
  Name = Daemon
  mailcommand = "/root/bacula/bin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" 
-s \"Bacula daemon message\" %r"
  mail = root@localhost = all, !skipped
  console = all, !skipped, !saved
  append = "/root/bacula/bin/working/log" = all, !skipped
}
Pool {
  Name = Default
  Pool Type = Backup
  Recycle = yes
  AutoPrune = yes
  Volume Retention = 365 days
}
Pool {
  Name = File
  Pool Type = Backup
  Recycle = yes
  AutoPrune = yes
  Volume Retention = 365 days
  Maximum Volume Bytes = 50G
  Maxim

Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device"LTO4"

2011-07-10 Thread Steve Costaras

-Original Message-
From: Dan Langille [mailto:d...@langille.org]
Sent: Sunday, July 10, 2011 12:58 PM
To: stev...@chaven.com
Cc: bacula-users@lists.sourceforge.net
Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block to 
device "LTO4"

>>
>> 2) since everything is spooled first, there should be NO error that should 
>> cancel a job. A tape drive could fail, a tape could burst into flame, all 
>> that would be needed was bacula to know that >>there was an issue and give 
>> the admin a simple statement do you want to fix the issue or cancel?, the 
>> admin to fix the problem, and then bacula told to restart from the last 
>> block that was >>stored successfully OR if need be from the beginning of the 
>> spooled data file.

>This I do know. Although, at first glance it seems easy to do this, it is not. 
>If it was trivial to do, I assure you, it would already be in place.

>> Canceling jobs that run for days for TB's of data is just screwed up.

>I suggest running smaller jobs. I don't mean to sound trite, but that really 
>is the solution. Given that the alternative is non-trivial, the sensible 
>choice is, I'm afraid, cancel the job.

I'm already kicking off 20+ jobs for a single system already. This does not 
work when we're talking over the 100TB/nearly 200TB mark. And when these errors 
happen it does not matter how many jobs you have as /all/ outstanding jobs fail 
when you have concurancy (in this case all jobs that were qued and were not 
even writing to the same tape were canceled). This does not happen with any 
other enterprise backup software not that they should be 100% mimicked. With 
the data sizes we have today I don't see why there are not better error 
handling checks/routines.





--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Copy/Migration utilization

2011-07-10 Thread Harald Schmalzbauer
schrieb Harald Schmalzbauer am 10.07.2011 16:28 (localtime):
> schrieb Adrian Reyer am 10.07.2011 15:58 (localtime):
> ...
>> I found using Spool Data for copy jobs to be faster for my setup. I have
>> fast local disks for spooling, but some of my disk storage is accessed
>> vie iSCSI on 1-GBit/s-links.
>> However, I am currently running a few copy jobs and the limiting factor
>> seems to be my bacula-sd consuming 1 complete CPU, throttling me at
>> 55MB/s. The CPU is an older 'AMD Athlon(tm) 64 X2 Dual Core Processor
>> 3800+'
> 
> Hmmm, interesting point. Is the storage daemon single threaded? I'll
> check this, since I have an "old" dual core Xeon, but I checked that
> there were at least 25% idle of CPU time (ZFS compression is used, so
> that client-to-disk backup is uncompressed avoiding interference with
> later (migration) tape drive compression)
> 
>>> Maybe this was not an issue with slower tape drives. LTO2 would only
>>> suffer from about 6% performance loss, if my wild guess has any truth...
>>
>> LTO4 as well here, and no ear next to the drive. However, 'mt status'
>> won't run as the drive is in use by the copy jobs, how you got that
>> info?
> 
> On FreeBSD there's a special control device for every sequential access
> device (/dev/sa0 /dev/sa0.ctl for example).
> 
> No luck so far finding technical end-user-details for the design of LTO
> drives. I'm really wondering if file marks are written to the tape or
> just in the cartridge memory chip. And even if they get written to the
> tape, is it unavoidable that streaming is interrupted???

But reading the great official bacula manual answered one of my
questions and prooved my wild guess to be correct.
in Chapter 19, page 188, one can find:

Maximum File Size = size No more than size bytes will be written into a
given logical file on the volume.
Once this size is reached, an end of file mark is written on the
volume and subsequent data are written into the next file.
...
If you are configuring an LTO-3 or LTO-4 tape, you probably will
want to set the Maximum File Size to 2GB to avoid making the drive
stop to write an EOF mark.
...

I set it to 100G and the frequent interruptions vanished :-)
Maybe the suggested 2GB are well chosen for LTO-3, I think for LTO-4 you
need much larger values than 2GB.

Now I still have the problem that I don't get more than 110MB/s using
bacula to the drive, while %busy states 55% and the disks from the ZFS
pool only read 15MB/s, reflecting that currently written material is
compresed by almost 2:1. I have seen transerrates of 150MB/s with tar...
CPU is 83% idle, ZFS disks <10% busy and the tape drive ~50% idle. Why
don't I get the 150MB/s as seen with tar ...?!?...
I'll report if I find out and thank you in advance for any hints.

-Harry



signature.asc
Description: OpenPGP digital signature
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

2011-07-10 Thread Dan Langille

On Jul 10, 2011, at 8:17 AM, Steve Costaras wrote:

> 
> 
> I am trying a full backup/multi-job to a single client and all was going well 
> until this morning when I received the error below.   All other jobs were 
> also canceled.  
> 
> My question is two fold:
> 
> 1) What the heck is this error?   I can unmount the drive, issue a rawfill to 
> the tape w/ btape and no problems?   

I don't know.  Perhaps someone else will.

> 
> 2) since everything is spooled first, there should be NO error that should 
> cancel a job.   A tape drive could fail, a tape could burst into flame,  all 
> that would be needed was bacula to know that there was an issue and give the 
> admin a simple statement do you want to fix the issue or cancel?, the admin 
> to fix the problem, and then bacula told to restart from the last block that 
> was stored successfully OR if need be from the beginning of the spooled data 
> file.

This I do know.  Although, at first glance it seems easy to do this, it is not. 
   If it was trivial to do, I assure you, it would already be in place.

> Canceling jobs that run for days for TB's of data is just screwed up.

I suggest running smaller jobs.  I don't mean to sound trite, but that really 
is the solution.  Given that the alternative is non-trivial, the sensible 
choice is, I'm afraid, cancel the job.

> 
> Steve 
> 
> 
> 3000 OK label. VolBytes=1024 DVD=0 Volume="FA0016" Device="LTO4" (/dev/nst0)
> Requesting to mount LTO4 ...
> 3905 Bizarre wait state 7
> Do not forget to mount the drive!!!
> 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume "FA0016" on 
> device "LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 6: New volume "FA0016" mounted on device "LTO4" 
> (/dev/nst0) at 10-Jul-2011 03:51.
> 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
> read-only Volume. dev="LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 6: End of medium on Volume "FA0016" Bytes=1,024 
> Blocks=0 at 10-Jul-2011 03:51.
> 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
> 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. 
> Cannot write overflow block to device "LTO4" (/dev/nst0). ERR=Input/output 
> error
> 
> *
> 2011-07-10 03SD-loki JobId 6: Despooling elapsed time = 02:32:53, Transfer 
> rate = 93.64 M Bytes/second
> 2011-07-10 03SD-loki JobId 6: Job write elapsed time = 57:37:54, Transfer 
> rate = 8.278 M Bytes/second
> 2011-07-10 03FD-loki JobId 6: Error: bsock.c:393 Write error sending 65536 
> bytes to Storage daemon:loki:9103: ERR=Connection reset by peer
> 2011-07-10 03FD-loki JobId 6: Fatal error: backup.c:1024 Network send error 
> to SD. ERR=Connection reset by peer
> 2011-07-10 03SD-loki JobId 7: Fatal error: block.c:439 Attempt to write on 
> read-only Volume. dev="LTO4" (/dev/nst0)
> 2011-07-10 03SD-loki JobId 7: Fatal error: spool.c:301 Fatal append error on 
> device "LTO4" (/dev/nst0): ERR=block.c:1015 Read zero bytes at 0:0 on device 
> "LTO4" (/dev/nst0).
> 
> 2011-07-10 03SD-loki JobId 7: Despooling elapsed time = 00:00:01, Transfer 
> rate = 858.9 G Bytes/second
> *
> 2011-07-10 03DIR-loki JobId 6: Error: Bacula DIR-loki 5.0.3 (04Aug10): 
> 10-Jul-2011 03:52:08
>  Build OS:   x86_64-unknown-linux-gnu ubuntu 10.04
>  JobId:  6
>  Job:
> JOB-loki_var_ftp_pub_Multimedia_DVD.2011-07-07_17.45.01_08
>  Backup Level:   Full
>  Client: "FD-loki" 5.0.3 (04Aug10) 
> x86_64-unknown-linux-gnu,ubuntu,10.04
>  FileSet:"FS-loki_var_ftp_pub_Multimedia_DVD" 2011-07-06 
> 18:00:01
>  Pool:   "BackupSetFA" (From Run FullPool override)
>  Catalog:"MyCatalog" (From Client resource)
>  Storage:"LTO4" (From Pool resource)
>  Scheduled time: 07-Jul-2011 17:45:01
>  Start time: 07-Jul-2011 17:50:30
>  End time:   10-Jul-2011 03:52:08
>  Elapsed time:   2 days 10 hours 1 min 38 secs
>  Priority:   50
>  FD Files Written:   452
>  SD Files Written:   452
>  FD Bytes Written:   1,717,640,639,816 (1.717 TB)
>  SD Bytes Written:   1,717,632,388,872 (1.717 TB)
>  Rate:   8222.4 KB/s
>  Software Compression:   None
>  VSS:no
>  Encryption: no
>  Accurate:   yes
>  Volume name(s): FA0011|FA0012|FA0015
>  Volume Session Id:  6
>  Volume Session Time:1310078212
>  Last Volume Bytes:  1,024 (1.024 KB)
>  Non-fatal FD errors:1
>  SD Errors:  0
>  FD termination status:  Error
>  SD termination status:  Error
>  Termination:*** Backup Error ***
> ---
> 
> 
> 
> --
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security 
> threats, fraudulent activity, and more. Splu

Re: [Bacula-users] Device

2011-07-10 Thread Mike Hobbs
On 6/29/2011 5:05 PM, Josh Fisher wrote:
> By default, Bacula will select a volume that is already in a drive in
> preference to a volume not in a drive. For concurrent jobs writing to
> the same pool, this means they will always select the same volume. Thus
> if you set MaximumConcurrentJobs=1 in the SD Device, then it will not be
> possible to run concurrent jobs that write to the same pool, because the
> concurrent jobs will select the same volume, which can only be written
> to by one job at a time, forcing them to be serialized. To get around
> the default behavior, set PreferMountedVolumes=no in the Job definition
> of the jobs that will both run concurrently AND write to the same pool.
> This will cause the opposite behavior. Bacula will prefer selecting a
> volume that is NOT already in use in a drive, effectively meaning it
> will select a volume that is not already in use, loading it into another
> drive if necessary. This way, jobs writing to the same pool can run
> concurrently, each writing to a different volume, ensuring that volume
> data is not interleaved.

I've had concurrent backups working fine for the past few days, this 
morning I login to check my server and I noticed that bacula/vchanger is 
only using 2 of my 4 virtual drives.  I don't understand why, could you 
point me in the right direction to find out why this is?

Also, I was wondering if you could explain to me scheduling/priority.  I 
just assumed that bacula ran jobs in the order they were added.. But I 
have noticed this is not the case, bacula seems to add jobs to the 
scheduler and then it seems to just pick random jobs in the queue to 
run.  The ideal situation that I would like to see, I would like my 
incrementals to have a higher priority that my full backups, and I would 
like my restore jobs to have a higher priority than my incrementals. 
Right now I am using only one JobsDef so all my jobs are running with a 
priority of 10, I changed my Restores job to have a priority of 9, I 
also set Allow Mixed Priority = yes in all my jobs, but my restore jobs 
are not being run first, in fact as I stated above, I can't seem to 
figure out how bacula is deciding which jobs to run first and why.  To 
me it seems like the priority line doesn't work, or I am configuring it 
wrong.


Thank you!

mike

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Copy/Migration utilization

2011-07-10 Thread Harald Schmalzbauer
schrieb Adrian Reyer am 10.07.2011 15:58 (localtime):
...
> I found using Spool Data for copy jobs to be faster for my setup. I have
> fast local disks for spooling, but some of my disk storage is accessed
> vie iSCSI on 1-GBit/s-links.
> However, I am currently running a few copy jobs and the limiting factor
> seems to be my bacula-sd consuming 1 complete CPU, throttling me at
> 55MB/s. The CPU is an older 'AMD Athlon(tm) 64 X2 Dual Core Processor
> 3800+'

Hmmm, interesting point. Is the storage daemon single threaded? I'll
check this, since I have an "old" dual core Xeon, but I checked that
there were at least 25% idle of CPU time (ZFS compression is used, so
that client-to-disk backup is uncompressed avoiding interference with
later (migration) tape drive compression)

>> Maybe this was not an issue with slower tape drives. LTO2 would only
>> suffer from about 6% performance loss, if my wild guess has any truth...
> 
> LTO4 as well here, and no ear next to the drive. However, 'mt status'
> won't run as the drive is in use by the copy jobs, how you got that
> info?

On FreeBSD there's a special control device for every sequential access
device (/dev/sa0 /dev/sa0.ctl for example).

No luck so far finding technical end-user-details for the design of LTO
drives. I'm really wondering if file marks are written to the tape or
just in the cartridge memory chip. And even if they get written to the
tape, is it unavoidable that streaming is interrupted???
Questions over questions, hopefully one magneto-guru will read...

Thanks,

-Harry




signature.asc
Description: OpenPGP digital signature
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Copy/Migration utilization

2011-07-10 Thread Adrian Reyer
Hi Harry,

On Sun, Jul 10, 2011 at 03:39:36PM +0200, Harald Schmalzbauer wrote:
> The server is at remote site, so I can't hear any mechanicals, but I
> guess "at rest" means stop, thus my worries about extensive repositioning.

Sorry, I am no expert on drives and only use bacula my very self for 6
weeks now.

> I have no backup jobs using the tape drive, so no spool is in use. I
> only use the tape drive for migration (or sometimes copy) jobs. And in
> the disk-pool I use "Use Volume Once = yes", so every job has it's own
> file without interleaved data, which has exactly the size the job
> summary reports.

I found using Spool Data for copy jobs to be faster for my setup. I have
fast local disks for spooling, but some of my disk storage is accessed
vie iSCSI on 1-GBit/s-links.
However, I am currently running a few copy jobs and the limiting factor
seems to be my bacula-sd consuming 1 complete CPU, throttling me at
55MB/s. The CPU is an older 'AMD Athlon(tm) 64 X2 Dual Core Processor
3800+'

> Maybe this was not an issue with slower tape drives. LTO2 would only
> suffer from about 6% performance loss, if my wild guess has any truth...

LTO4 as well here, and no ear next to the drive. However, 'mt status'
won't run as the drive is in use by the copy jobs, how you got that
info?

Regards,
Adrian
-- 
LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart
Fon: +49 (7 11) 78 28 50 90 - Fax:  +49 (7 11) 78 28 50 91
Mail: li...@lihas.de - Web: http://lihas.de
Linux, Netzwerke, Consulting & Support - USt-ID: DE 227 816 626 Stuttgart


signature.asc
Description: Digital signature
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Copy/Migration utilization

2011-07-10 Thread Harald Schmalzbauer
schrieb Adrian Reyer am 10.07.2011 14:43 (localtime):
> Hi Harry,
> 
> On Sun, Jul 10, 2011 at 01:38:52PM +0200, Harald Schmalzbauer wrote:
>> of a average transfer rate of 65MB/s makes me worrying about massive
>> repositioning.
> 
> AFAIK LTO-Drives have adaptive speeds compared to older technologies. If
> the data comes in slower, the drive will just run slower on a somewhat
> constant speed. No more stop-and-go.

Hi Adrian,

thanks for your reply. I read about throttling capability, but `mt
status` shows drive state "at rest" even during active transferes.
The server is at remote site, so I can't hear any mechanicals, but I
guess "at rest" means stop, thus my worries about extensive repositioning.

>> Can I optimize my setup so that there won't be so many new files written
>> on tape? Or should the creation of a new file mark been done without
>> interruption of the transfer, and there's something wrong with my setup?
> 
> Do you use 'Spool Data = yes'?
> To my understanding you can run multiple jobs to storage the same time,
> but they end up interleaved. Spooling the data will write full jobs or
> at least bigger chunks of a job in one run.

I have no backup jobs using the tape drive, so no spool is in use. I
only use the tape drive for migration (or sometimes copy) jobs. And in
the disk-pool I use "Use Volume Once = yes", so every job has it's own
file without interleaved data, which has exactly the size the job
summary reports.
Do you know if marking a "new file" on tape interrupts the LTO drive
from streaming? Perhaps it shouldn't interrupt streaming, and writing
many files for one job is a well choosen design, and I'm suffering from
some other misconfiguration which leads to interruption on marking new file.
If it's technically not possible to keep the drive streaming while
marking a new file, then I'm interested in tweaks how to avoid hundreds
of "new file" marks per backup job.
Do others see 208 files after 200G writte on tape?
Wild guess: If one file is marked every 1GByte written (wich takes max
12.8sec in my case), and this file mark interrupts the drive for round
about 4 seconds, then my transfer rate decreases from the usual rate for
uncompressable material of 80MB/s by 25% to ~60MB/s. That would exactly
represent the numbers I'm seeing here...
Which means the drive is only streaming 75% of the time, the rest is
used for repositionings :-(
Maybe this was not an issue with slower tape drives. LTO2 would only
suffer from about 6% performance loss, if my wild guess has any truth...

Thanks,

-Harry



signature.asc
Description: OpenPGP digital signature
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Copy/Migration utilization

2011-07-10 Thread Adrian Reyer
Hi Harry,

On Sun, Jul 10, 2011 at 01:38:52PM +0200, Harald Schmalzbauer wrote:
> of a average transfer rate of 65MB/s makes me worrying about massive
> repositioning.

AFAIK LTO-Drives have adaptive speeds compared to older technologies. If
the data comes in slower, the drive will just run slower on a somewhat
constant speed. No more stop-and-go.

> Can I optimize my setup so that there won't be so many new files written
> on tape? Or should the creation of a new file mark been done without
> interruption of the transfer, and there's something wrong with my setup?

Do you use 'Spool Data = yes'?
To my understanding you can run multiple jobs to storage the same time,
but they end up interleaved. Spooling the data will write full jobs or
at least bigger chunks of a job in one run.

Regards,
Adrian
-- 
LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart
Fon: +49 (7 11) 78 28 50 90 - Fax:  +49 (7 11) 78 28 50 91
Mail: li...@lihas.de - Web: http://lihas.de
Linux, Netzwerke, Consulting & Support - USt-ID: DE 227 816 626 Stuttgart


signature.asc
Description: Digital signature
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

2011-07-10 Thread Steve Costaras


I am trying a full backup/multi-job to a single client and all was going well 
until this morning when I received the error below.   All other jobs were also 
canceled.  

My question is two fold:

1) What the heck is this error?   I can unmount the drive, issue a rawfill to 
the tape w/ btape and no problems?   

2) since everything is spooled first, there should be NO error that should 
cancel a job.   A tape drive could fail, a tape could burst into flame,  all 
that would be needed was bacula to know that there was an issue and give the 
admin a simple statement do you want to fix the issue or cancel?, the admin to 
fix the problem, and then bacula told to restart from the last block that was 
stored successfully OR if need be from the beginning of the spooled data file.

Canceling jobs that run for days for TB's of data is just screwed up.

Steve 


3000 OK label. VolBytes=1024 DVD=0 Volume="FA0016" Device="LTO4" (/dev/nst0)
Requesting to mount LTO4 ...
3905 Bizarre wait state 7
Do not forget to mount the drive!!!
2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume "FA0016" on 
device "LTO4" (/dev/nst0)
2011-07-10 03SD-loki JobId 6: New volume "FA0016" mounted on device "LTO4" 
(/dev/nst0) at 10-Jul-2011 03:51.
2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
read-only Volume. dev="LTO4" (/dev/nst0)
2011-07-10 03SD-loki JobId 6: End of medium on Volume "FA0016" Bytes=1,024 
Blocks=0 at 10-Jul-2011 03:51.
2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. 
Cannot write overflow block to device "LTO4" (/dev/nst0). ERR=Input/output error

*
2011-07-10 03SD-loki JobId 6: Despooling elapsed time = 02:32:53, Transfer rate 
= 93.64 M Bytes/second
2011-07-10 03SD-loki JobId 6: Job write elapsed time = 57:37:54, Transfer rate 
= 8.278 M Bytes/second
2011-07-10 03FD-loki JobId 6: Error: bsock.c:393 Write error sending 65536 
bytes to Storage daemon:loki:9103: ERR=Connection reset by peer
2011-07-10 03FD-loki JobId 6: Fatal error: backup.c:1024 Network send error to 
SD. ERR=Connection reset by peer
2011-07-10 03SD-loki JobId 7: Fatal error: block.c:439 Attempt to write on 
read-only Volume. dev="LTO4" (/dev/nst0)
2011-07-10 03SD-loki JobId 7: Fatal error: spool.c:301 Fatal append error on 
device "LTO4" (/dev/nst0): ERR=block.c:1015 Read zero bytes at 0:0 on device 
"LTO4" (/dev/nst0).

2011-07-10 03SD-loki JobId 7: Despooling elapsed time = 00:00:01, Transfer rate 
= 858.9 G Bytes/second
*
2011-07-10 03DIR-loki JobId 6: Error: Bacula DIR-loki 5.0.3 (04Aug10): 
10-Jul-2011 03:52:08
  Build OS:   x86_64-unknown-linux-gnu ubuntu 10.04
  JobId:  6
  Job:
JOB-loki_var_ftp_pub_Multimedia_DVD.2011-07-07_17.45.01_08
  Backup Level:   Full
  Client: "FD-loki" 5.0.3 (04Aug10) 
x86_64-unknown-linux-gnu,ubuntu,10.04
  FileSet:"FS-loki_var_ftp_pub_Multimedia_DVD" 2011-07-06 
18:00:01
  Pool:   "BackupSetFA" (From Run FullPool override)
  Catalog:"MyCatalog" (From Client resource)
  Storage:"LTO4" (From Pool resource)
  Scheduled time: 07-Jul-2011 17:45:01
  Start time: 07-Jul-2011 17:50:30
  End time:   10-Jul-2011 03:52:08
  Elapsed time:   2 days 10 hours 1 min 38 secs
  Priority:   50
  FD Files Written:   452
  SD Files Written:   452
  FD Bytes Written:   1,717,640,639,816 (1.717 TB)
  SD Bytes Written:   1,717,632,388,872 (1.717 TB)
  Rate:   8222.4 KB/s
  Software Compression:   None
  VSS:no
  Encryption: no
  Accurate:   yes
  Volume name(s): FA0011|FA0012|FA0015
  Volume Session Id:  6
  Volume Session Time:1310078212
  Last Volume Bytes:  1,024 (1.024 KB)
  Non-fatal FD errors:1
  SD Errors:  0
  FD termination status:  Error
  SD termination status:  Error
  Termination:*** Backup Error ***
---



--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Copy/Migration utilization

2011-07-10 Thread Harald Schmalzbauer
Dear bacula insiders,

I got a replacement for my old DDS5 - a LTO4.
Setup is D2D2T.
The problem is not the setup, but the tape drive utilization.
The disk-storage can provide well over 300MByte/s, and using tar with -b
126 or dump or dd, I see 78-160MB/s moving to the drive. So the problem
is probably not hardware related.

When I have bacula copying a job to tape, there are very often outages,
weher no data at all is transfered to the tape.
I found out that after this break, the `mt status` reports a incremented
file number. When does bacula write a new file, meaning why so often?
I'd like to treat my tape drive as gently as possible, so the job result
of a average transfer rate of 65MB/s makes me worrying about massive
repositioning.
Can I optimize my setup so that there won't be so many new files written
on tape? Or should the creation of a new file mark been done without
interruption of the transfer, and there's something wrong with my setup?

Another strange thing for me is the device utilization. When using tar I
can see a %busy report of ~90 constantly for the tape device, no matter
if the transferrate is 80MB/s or (due to compressable material) 120MB/s.
When bacula writes 75MB/s, I get only 68% busy reported ?!? Up to the
break point (new file), after which I get 350% usage And if
compression allows 100MB/s the %busy rate decreases to 50!?!
But this is probably OS-specific (FreeBSD), just in case someone could
enlighten me why this is not the case with other tape writers...

Thanks ins advance,

-Harry



signature.asc
Description: OpenPGP digital signature
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users