[Bacula-users] The number of files mismatch! Marking volume in Error in Catalog

2009-06-02 Thread Bob Hetzel

Greetings,

I've been seeing an issue whereby a volume gets marked in error 
periodically.  The last items logged about that volume are typically like this:

02-Jun 11:53 gyrus-sd JobId 83311: Volume "LTO224L2" previously written, 
moving to end of data.
02-Jun 11:53 gyrus-sd JobId 83311: Error: Bacula cannot write on tape 
Volume "LTO224L2" because:
The number of files mismatch! Volume=46 Catalog=45
02-Jun 11:53 gyrus-sd JobId 83311: Marking Volume "LTO224L2" in Error in 
Catalog.

I don't think I have any SCSI errors, but instead the problem seems to be 
related to bacula not properly keeping track of the volume files in some 
rare case.

This time the problem happened not too long after the volume got recycled 
and so I noted one thing about how the tape was used... a backup started on 
another volume and then spanned onto it.  Could that be a source of these 
problems?

Here's the pertinent part of the bacula log file--debugging not turned on 
right now but I'm hoping enough got logged to help.  If not I'll have to 
turn debugging back on but what level would be good for determining the 
source of that error?

http://casemed.case.edu/admin_computing/bacula/bacula-2009-06-01.log.txt

Bob

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] The number of files mismatch! Marking volume in Error in Catalog

2009-06-04 Thread Christian Gaul
Bob Hetzel schrieb:
> Greetings,
>
> I've been seeing an issue whereby a volume gets marked in error 
> periodically.  The last items logged about that volume are typically like 
> this:
>
> 02-Jun 11:53 gyrus-sd JobId 83311: Volume "LTO224L2" previously written, 
> moving to end of data.
> 02-Jun 11:53 gyrus-sd JobId 83311: Error: Bacula cannot write on tape 
> Volume "LTO224L2" because:
> The number of files mismatch! Volume=46 Catalog=45
> 02-Jun 11:53 gyrus-sd JobId 83311: Marking Volume "LTO224L2" in Error in 
> Catalog.
>
> I don't think I have any SCSI errors, but instead the problem seems to be 
> related to bacula not properly keeping track of the volume files in some 
> rare case.
>
> This time the problem happened not too long after the volume got recycled 
> and so I noted one thing about how the tape was used... a backup started on 
> another volume and then spanned onto it.  Could that be a source of these 
> problems?
>
> Here's the pertinent part of the bacula log file--debugging not turned on 
> right now but I'm hoping enough got logged to help.  If not I'll have to 
> turn debugging back on but what level would be good for determining the 
> source of that error?
>
> http://casemed.case.edu/admin_computing/bacula/bacula-2009-06-01.log.txt
>
> Bob
>
> --
> OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
> looking to deploy the next generation of Solaris that includes the latest 
> innovations from Sun and the OpenSource community. Download a copy and 
> enjoy capabilities such as Networking, Storage and Virtualization. 
> Go to: http://p.sf.net/sfu/opensolaris-get
> ___
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>   
To me this looks like an issue reported a couple of times on this list,
once by me and once by another user, whereby Bacula isnt updating the
Volume Files when doing concurrent jobs.

So far nobody has seemed interested in it. For me and another user it
has "worked" to set the maximum concurrent jobs to 1 on the device..
Yes, you will have jobs piling on for hours until they get worked off.

I witnessed this first after upgrading from 2.4.4 to 3.0.0 but have not
been able to track it down myself or i would have made a proper
bugreport for it..

Hope that helps a little

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] The number of files mismatch! Marking volume in Error in Catalog

2009-06-04 Thread Uwe Schuerkamp
On Thu, Jun 04, 2009 at 09:20:34AM +0200, Christian Gaul wrote:
> Bob Hetzel schrieb:
> > Greetings,
> >
> > I've been seeing an issue whereby a volume gets marked in error 
> > periodically.  The last items logged about that volume are typically like 
> > this:
> >
> > 02-Jun 11:53 gyrus-sd JobId 83311: Volume "LTO224L2" previously written, 
> > moving to end of data.
> > 02-Jun 11:53 gyrus-sd JobId 83311: Error: Bacula cannot write on tape 
> > Volume "LTO224L2" because:
> > The number of files mismatch! Volume=46 Catalog=45
> > 02-Jun 11:53 gyrus-sd JobId 83311: Marking Volume "LTO224L2" in Error in 
> > Catalog.
> >
> > I don't think I have any SCSI errors, but instead the problem seems to be 
> > related to bacula not properly keeping track of the volume files in some 
> > rare case.
> >
> > This time the problem happened not too long after the volume got recycled 
> > and so I noted one thing about how the tape was used... a backup started on 
> > another volume and then spanned onto it.  Could that be a source of these 
> > problems?
> >
> > Here's the pertinent part of the bacula log file--debugging not turned on 
> > right now but I'm hoping enough got logged to help.  If not I'll have to 
> > turn debugging back on but what level would be good for determining the 
> > source of that error?
> >
> > http://casemed.case.edu/admin_computing/bacula/bacula-2009-06-01.log.txt
> >
> > Bob
> >

> To me this looks like an issue reported a couple of times on this list,
> once by me and once by another user, whereby Bacula isnt updating the
> Volume Files when doing concurrent jobs.
> 
> So far nobody has seemed interested in it. For me and another user it
> has "worked" to set the maximum concurrent jobs to 1 on the device..
> Yes, you will have jobs piling on for hours until they get worked off.
> 
> I witnessed this first after upgrading from 2.4.4 to 3.0.0 but have not
> been able to track it down myself or i would have made a proper
> bugreport for it..
> 
> Hope that helps a little

Hi, 

we're running bacula 2.2.8, using concurrent jobs = 2 on a disk based
set of volumes. I've done several restores from those volumes without
any errors, and haven't seen the error you mention in a good 3 months
or so since having switched from concurrent jobs = 1 to " = 2", so I'd
consider this a "positive" report that the feature actually does
work. The problem bug may have been introduced in a later version of
bacula. 

All the best, 

Uwe 


-- 
uwe.schuerk...@nionex.net phone: [+49] 5242.91 - 4740, fax:-69 72
Hauptsitz: Avenwedder Str. 55, D-33311 Guetersloh, Germany
Registergericht Guetersloh HRB 4196, Geschaeftsfuehrer: Horst Gosewehr
NIONEX ist ein Unternehmen der DirectGroup Germany www.directgroupgermany.de

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] The number of files mismatch! Marking volume in Error in Catalog

2009-06-05 Thread user100
  On 04.06.2009 10:16, Uwe Schuerkamp wrote:
> On Thu, Jun 04, 2009 at 09:20:34AM +0200, Christian Gaul wrote:
>> Bob Hetzel schrieb:
>>> Greetings,
>>>
>>> I've been seeing an issue whereby a volume gets marked in error
>>> periodically.  The last items logged about that volume are typically like 
>>> this:
>>>
>>> 02-Jun 11:53 gyrus-sd JobId 83311: Volume "LTO224L2" previously written,
>>> moving to end of data.
>>> 02-Jun 11:53 gyrus-sd JobId 83311: Error: Bacula cannot write on tape
>>> Volume "LTO224L2" because:
>>> The number of files mismatch! Volume=46 Catalog=45
>>> 02-Jun 11:53 gyrus-sd JobId 83311: Marking Volume "LTO224L2" in Error in
>>> Catalog.
>>>
>>> I don't think I have any SCSI errors, but instead the problem seems to be
>>> related to bacula not properly keeping track of the volume files in some
>>> rare case.
>>>
>>> This time the problem happened not too long after the volume got recycled
>>> and so I noted one thing about how the tape was used... a backup started on
>>> another volume and then spanned onto it.  Could that be a source of these
>>> problems?
>>>
>>> Here's the pertinent part of the bacula log file--debugging not turned on
>>> right now but I'm hoping enough got logged to help.  If not I'll have to
>>> turn debugging back on but what level would be good for determining the
>>> source of that error?
>>>
>>> http://casemed.case.edu/admin_computing/bacula/bacula-2009-06-01.log.txt
>>>
>>>  Bob
>>>
>> To me this looks like an issue reported a couple of times on this list,
>> once by me and once by another user, whereby Bacula isnt updating the
>> Volume Files when doing concurrent jobs.
>>
>> So far nobody has seemed interested in it. For me and another user it
>> has "worked" to set the maximum concurrent jobs to 1 on the device..
>> Yes, you will have jobs piling on for hours until they get worked off.
>>
>> I witnessed this first after upgrading from 2.4.4 to 3.0.0 but have not
>> been able to track it down myself or i would have made a proper
>> bugreport for it..
>>
>> Hope that helps a little
> Hi,
>
> we're running bacula 2.2.8, using concurrent jobs = 2 on a disk based
> set of volumes. I've done several restores from those volumes without
> any errors, and haven't seen the error you mention in a good 3 months
> or so since having switched from concurrent jobs = 1 to " = 2", so I'd
> consider this a "positive" report that the feature actually does
> work. The problem bug may have been introduced in a later version of
> bacula.
>
> All the best,
>
> Uwe
>
>
Concurrent jobs worked well on 2.2.8 and previous versions on our 
backup-machine for years too. After the upgrade to 3.0.1 the files 
mismatch.
I have tried on CentOS and with different settings on storage-daemon 
setup a new backup-server on Debian for testing, made a firmware upgrade 
on the autoloader, changed the database, changed the tapes... run btape 
test (again) does not help so far. With max concurrent jobs=1 it works. 
So currently it seems for me the least common denominator for this 
failure is (the upgrade to) Bacula 3.0.1.

Additional I had a failure with concurrent jobs and file-based backups. 
But it seems that is solved now. It was the same story as on tape. When 
two jobs was started the second one failed after a few seconds (with a 
little bit different error-message as on tape) each time. After 
recompiling on a new setup server on Debian I did not get this failure 
anymore. So I recompiled with the same configure settings on CentOS too 
and file-based concurrent backups seems to work on CentOS too now. I 
tried with the old compilation folder (/usr/src/bacula-...) of 3.0.1 
which installation did not work on CentOS - however I did not get the 
failure on concurrent file-based backups with that compilation too! I 
don´t know exactly what changed in meanwhile except the default-paths 
with the new make (with less configure options). I found out "make 
uninstall" did not remove all files from the system so I guess that 
there was an old file hanging around in some preferred earlier PATH from 
some other bacula version that caused troubles and was overwritten with 
the later compilation and installation (with other pathnames) maybe from 
1.3.8 or 2.0.x or 2.2.8. It was perfectly reproduceable with filebased 
backup before. However tape based backup still don´t work with 3.0.1 and 
concurrent jobs (even on a new setup server without previous installed 
bacula).


Greetings,
user100

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
h