Hello,

no, probably you didn't found which are the missing files. After we
restore we compare the restored files with original. The conclusion is
that there are really missing files! (As I mentioned those are not
hardlinks, sockets, etc - in a test we had missing /home/ directory
and all files in it!)

Bacula's counter is OK and from our tests I can say that the only good
restore is that when those numbers match. If you see difference like
below, you can be sure your restore file set is really wrong.

Could you please check if the above is true for your restores? Would
be helpful to know we are not alone.

P.S. And please, anyone who never did a restore, please do some tests.
That way you will be sure you have a valid backup OR you will know you
have an invalid one ;) The easier test is to do full restore somewhere
and to check if the "Files Expected" and "Files Restored" are much
different (and/or the error regarding bad file size).

Regards.


Monday, July 23, 2007, 11:18:00 PM:

>> sometimes not all files are restored, but tens of thousands are 
>> missing, an example:
>> Files Expected:         190,718
>> Files Restored:         166,097
>> This happens more often (~one case per 2 jobs).

J> Just to say that I have the case too every time I restore, but I think
J> you can ignore it. Following my observations and the little tests I
J> made, the difference between "files expected" and "files restored" are
J> the number of directories. It seems that bacula doesn't handle this
J> properly (a bug in the counter .. ?).

J> Regards,
J> Julien


J> On Mon, 2007-07-23 at 23:01 +0300, Doytchin Spiridonov wrote:
>> Hello,
>> 
>> I forgot to mention something very IMPORTANT: I discovered that in
>> *all* of such cases (restored files with larger size), if we don't
>> perform full restore, but restore a SINGLE file, it is restored OK
>> with *correct* size and content. It is OK even if we restore the
>> directory where it is (with the other files in it).
>> 
>> Which proves its is not a problem with the FS, kernel, xen, lvm,
>> hardware, etc, but it is a problem with Bacula.
>> 
>> Regards
>> 
>> 
>> Monday, July 23, 2007, 9:57:40 PM:
>> 
>> DS> Hello,
>> 
>> DS> I've filed this as a bug, but while Kern couldn't reproduce it he gave
>> DS> up. So let us find here what could be the problem. There are actually
>> DS> two problems, they could be linked.
>> 
>> DS> Here is the history:
>> DS> Initially we were using 2.0.3. Running backups for several weeks I
>> DS> wanted to restore a file and was surprised that I can't restore it. It
>> DS> was listed in the catalog, I could select it and run a restore job,
>> DS> but the file didn't come up. Investigating what happened I run a full
>> DS> restore job and was surprised that in that directory (where the file
>> DS> is) several files are missing. Also the error message similar to the
>> DS> one in my first post here were present. In addition to it there was a
>> DS> big difference between marked files and actually restored files (sure
>> DS> not hard links, sockets or anything else that is ignored by Bacula -
>> DS> at one of the tests the whole /home/ directory was missing).
>> DS> After that we startd with tests (backup full/diff/inc, restore etc)
>> DS> for a week. Every time (but at random places/files) similar error
>> DS> happen. Sometimes there are errors, sometimes not. Haven't run so much
>> DS> tests so I could come up with a decision when this happens. But IT
>> DS> HAPPENS and as a result we don't have a reliable backup. I know a lot
>> DS> of people run backups w/o testing restores and that's why (if this is
>> DS> not related to our specific setup) those problem could appear only if
>> DS> they have emergency which actually doesn't happen often. Anyway, here
>> DS> are the hardware and setup details:
>> 
>> DS> *** Bacula: 2.1.28 on all servers.
>> >>From yesterday we cleaned everything (bacula DB and volumes) and
>> DS> installed everywhere the latest beta *2.1.28* (note this is not the
>> DS> problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed
>> DS> 2 other problems we discovered with 2.0.3, but this one is still
>> DS> there.
>> DS> Director and most of the servers are 64 bit, two of the servers are 32
>> DS> bit.
>> DS> *** OS: Linux CentOS 4.5
>> DS> *** MySQL: 5.0.37
>> DS> *** Servers (all are almost identical): Supermicro, PDSME - Intel
>> DS> E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware
>> DS> IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID
>> DS> 1+0, only the Bacula server has many disks in RAID 5.
>> DS> *** Some servers are plain CentOS, some have Xen with virtual servers,
>> DS> the Bacula server itsels also has Xen, but the Bacula is running in
>> DS> Dom0, no other virtual machines at this time are running on it.
>> DS> *** Those servers with Xen als have LVM.
>> DS> *** We run (and I guess here is the problem of Bacula) concurrent
>> DS> jobs.
>> DS> *** GZIP compression is enabled.
>> DS> *** we save volumes on harddisk, their size is set to 4480MB
>> 
>> DS> --- How to get an error:
>> DS> As initially we discovered the error after several weeks of backups,
>> DS> We guessed that this could ba caused by us by a wrong setting of
>> DS> Volume Retention or any other Retention time and some files are
>> DS> purged.
>> 
>> DS> We started everything from zero again, and after 3 days (it happened
>> DS> that the first was Full, the next Differential and the last
>> DS> Incremental) we performed a test and that error happened again! So we
>> DS> were sure this is not caused by purge of some files accidentally.
>> 
>> DS> After that we could get that error even after just a full backup,
>> DS> trying to restore immediately after it is finished.
>> 
>> DS> Yesterday we cleaned everything again and compiled (from SRPMs) the
>> DS> latest 2.1.28.
>> 
>> DS> We run again full backup (again all concurret jobs) and the errors
>> DS> described here happen when we try to restore files from every job
>> DS> (except one where there are just 150 files).
>> 
>> DS> So the problems are two:
>> DS> - sometimes some files are restored with higher size, while the first
>> DS> part of the file matches exactly the original file (not log files or
>> DS> dynamic files) This happens on very rare cases (~one case per 5 jobs)
>> DS> - sometimes not all files are restored, but tens of thousands are
>> DS> missing, an example:
>> DS>   Files Expected:         190,718
>> DS>   Files Restored:         166,097
>> DS> This happens more often (~one case per 2 jobs).
>> 
>> DS> Note that once the error happens we can reproduce it on every restore
>> DS> at the same place for the same file and the same number of missing
>> DS> files (i.e. this is not a problem of restore, it is most a problem of
>> DS> volumes).
>> 
>> DS> What are our future tests:
>> DS> 1. we will do the same (concurrent jobs) but w.o using GZIP
>> DS> 2. if it happens again we will set max jobs to 1 so every job is run
>> DS> alone. Because when testing AFAIR we didn't get errors when we run
>> DS> just one full backup job. This always happen when we do several at
>> DS> once (but I am not 100% sure, thats why we will test this)
>> DS> 3. if it still happens we will run it with normal kernel (so to exclude
>> DS> the Xen influence)
>> DS> 4. last we will try w/o LVM (which would be harder)
>> 
>> DS> Regards
>> DS> P.S. sorry for my English :)
>> 
>> 
>> DS> Monday, July 23, 2007, 9:03:45 PM:
>> 
>> RN>> -----BEGIN PGP SIGNED MESSAGE-----
>> RN>> Hash: SHA1
>> 
>> RN>> Doytchin Spiridonov wrote:
>> >>> Hello,
>> >>> 
>> >>> trying to identify a bug in bacula and/or our system setup.
>> >>> 
>> >>> Is there anyone that on restore had errors like this:
>> >>> 
>> >>> Error: attribs.c:410 File size of restored file
>> >>> /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
>> >>> not correct. Original 3826291, restored 10620921.
>> >>> 
>> >>> - the file is not a log file or any file that has changed during the
>> >>> backup (in which cases an error like the one above should be normal)
>> >>> 
>> >>> - the wrong file size is always larger that the original; if we cut
>> >>> the first N bytes, where the N is the correct file size, the original
>> >>> and restored files match; we noted that the appended data is part of
>> >>> another file from the backup, not a garbage data. Note that this other
>> >>> file (from which some part has been appended to the file with wrong
>> >>> size) is restored correctly, so the only problem is wrong file size
>> >>> decision by bacula and reading further than its end (seems this is
>> >>> some internal buffer of Bacula as the data is stored in the volumes
>> >>> using GZIP and just reading further would break everything and the
>> >>> appended data should be garbage, not unzipped data).
>> 
>> RN>> This has been brought up several times within the last week, but never
>> RN>> with the explanation and examination. I wonder if some of the other who
>> RN>> have experienced it (I do not know their names -- hopefully they can
>> RN>> chime in) can do the same thing for us. This is potentially serious,
>> RN>> seems like, if it is a widespread problem.
>> 
>> RN>> I think if the others can verify it, this should also be copied to
>> RN>> Bacula devel. I think I will try a large restore of my own today to see
>> RN>> what happens.
>> 
>> RN>> Please give the rest of the details of your setup, however -- you don't
>> RN>> even include the Bacula version, and that is a very basic piece of
>> RN>> information. Operating system (presumably RedHat Linux from the file you
>> RN>> backed up, but who knows), architecture... all would be useful.
>> 
>> 
>> DS> -------------------------------------------------------------------------
>> DS> This SF.net email is sponsored by: Splunk Inc.
>> DS> Still grepping through log files to find problems?  Stop.
>> DS> Now Search log events and configuration files using AJAX and a browser.
>> DS> Download your FREE copy of Splunk now >>  http://get.splunk.com/
>> DS> _______________________________________________
>> DS> Bacula-users mailing list
>> DS> Bacula-users@lists.sourceforge.net
>> DS> https://lists.sourceforge.net/lists/listinfo/bacula-users
>> 
>> 
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems?  Stop.
>> Now Search log events and configuration files using AJAX and a browser.
>> Download your FREE copy of Splunk now >>  http://get.splunk.com/
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to