I finally had the chance to restart the storage daemon this morning in
order to disable autoxflate.
Today's copy jobs that ran afterwards finished without any warning, so I
think it's probably an issue with autoxflate (on replication?). I can also
successfully restore those new copies.
Best regards,
Samuel
On Monday, February 26, 2024 at 12:19:50 PM UTC+1 Samuel wrote:
> Yes, that's right. In this case though it's a consolidated incremental
> which was copied. However, the warnings also occur during migration jobs of
> consolidated fulls. The warning has never occurred while copying/migrating
> a non-consolidated incremental.
>
> This is all working just fine except for a couple of jobs which all have
> the following in common: filedaemon is on another host than director, they
> use LZ4 compression and autoxflate inflates on replication (though it's
> also enabled for all the other jobs but doesn't have any effect since they
> don't use compression).
>
> I use the SQL Query selection type:
> Selection Type = SQL Query
> Selection Pattern = "
> SELECT DISTINCT Job.JobId, Job.StartTime
> FROM Job, Pool
> WHERE Pool.Name IN ('AI-Consolidated', 'AI-Incremental') AND
> Pool.PoolId = Job.PoolId AND
> Job.Type = 'B' AND
> Job.JobStatus IN ('T', 'W') AND
> Job.Level IN ('I') AND
> Job.JobBytes > 0 AND
> Job.JobId NOT IN (
> SELECT PriorJobId
> FROM Job AS copy
> WHERE copy.Type IN ('B', 'C', 'A') AND
> copy.JobStatus IN ('T', 'W') AND
> copy.PriorJobId != 0
> )
> ORDER by Job.StartTime;"
>
> No, there's no concurrent writing going on currently. Before updating to
> Bareos 23 I did have devices with concurrent writing enabled, but that was
> before these warnings happened.
>
> Best regards,
> Samuel
>
> On Monday, February 26, 2024 at 11:51:59 AM UTC+1 Sebastian Sura wrote:
>
>> Hi Samuel
>>
>> i just wanted to make sure i completely understand the situation. You
>> have an always incremental job with compression
>> turned on (which algorithm do you use ?) and then you have a copy job
>> that copies the consolidated full job (on the same sd)
>> to tape with a device that auto inflates on write.
>>
>> Do i have that right ?
>>
>> Does the device where the job gets consolidated onto allow concurrent
>> writes from multiple jobs ?
>>
>> What selection type has your copy job ?
>>
>> Kind Regards
>> Sebastian Sura
>> Am 26.02.24 um 11:32 schrieb 'Samuel' via bareos-users:
>>
>> Hi again.
>>
>> So, as expected, the warning occurred again when job 94511 was copied
>> (but also two new warnings for this specific job):
>> Warning: dird/catreq.cc:608 MD5 digest not same File=3 as attributes=2
>> Warning: dird/catreq.cc:608 MD5 digest not same File=18 as attributes=17
>> Warning: dird/catreq.cc:608 MD5 digest not same File=23 as attributes=22
>>
>> The restore of the copy also errored in the same manner as before:
>> 2024-02-26 10:43:50 bareos-fd JobId 95784: Error: findlib/attribs.cc:381
>> File size of restored file
>> /tmp/bareos-restores/var/backup/old/2024-02-07_23:30/backup_www_1100CC.tar.gz
>>
>> not correct. Original 738212473, restored 442105856.
>> 2024-02-26 10:44:10 bareos-fd JobId 95784: Error: findlib/attribs.cc:381
>> File size of restored file
>> /tmp/bareos-restores/var/backup/old/2024-02-12_23:30/backup_www_1100CC.tar.gz
>>
>> not correct. Original 856739265, restored 783548416.
>> 2024-02-26 10:44:17 bareos-fd JobId 95784: Error: findlib/attribs.cc:381
>> File size of restored file
>> /tmp/bareos-restores/var/backup/old/2024-02-14_23:30/backup_www_1100CC.tar.gz
>>
>> not correct. Original 856739984, restored 31064064.
>>
>> For now I'll try disabling autoxflate and see if that solves the issue.
>>
>> Best regards,
>> Samuel
>>
>>
>> On Friday, February 23, 2024 at 2:02:28 PM UTC+1 Samuel wrote:
>>
>>> Thanks for looking into it!
>>>
>>> Hm, there seems to be no file with fileindex=3 in the database for the
>>> copy:
>>> Enter SQL query: select * from file where jobid=94146 and fileindex=3;
>>> No results to list.
>>>
>>> All files that are being backed up are just regular files, nothing
>>> special.
>>>
>>> The copy runs locally on the director's host from its SD to the same SD.
>>>
>>> Here's file 2 and 3 of today's consolidated incremental backup which
>>> will be copied tomorrow and most likely lead to the same warning again as
>>> it has for the last few days (for this specific fileset it always warns
>>> about `File=3 attributes=2`):
>>>
>>> Enter SQL query: select *, decode_lstat(lstat) from file where
>>> jobid=94511 and fileindex=2;
>>>
>>> +-------------+-----------+-------+---------+----------+--------+--------+--------+----------------------------------------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------+
>>> | fileid | fileindex | jobid | pathid | deltaseq | markid | fhinfo
>>> | fhnode | lstat |
>>> md5 | name | decode_lstat
>>> |
>>>
>>> +-------------+-----------+-------+---------+----------+--------+--------+--------+----------------------------------------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------+
>>> | 484,575,603 | 2 | 94511 | 910,901 | 0 | 0 | 0
>>> | 0 | P0D CgA4 IGk B A i A sADp5 BAA FgAo BlxAt1 BlxASF BlxVYH A A d
>>> | D/VouduY5TF4KFacjRE7Hw | backup_www_1100CC.tar.gz |
>>> (64771,655416,33188,1,0,34,0,738212473,4096,1441832,1707346805,1707345029,1707431431,0,0,29)
>>>
>>> |
>>>
>>> +-------------+-----------+-------+---------+----------+--------+--------+--------+----------------------------------------------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------+
>>>
>>> Enter SQL query: select *, decode_lstat(lstat) from file where
>>> jobid=94511 and fileindex=3;
>>>
>>> +-------------+-----------+-------+---------+----------+--------+--------+--------+---------------------------------------------------------------+------------------------+-----------------------------+-------------------------------------------------------------------------------------------+
>>> | fileid | fileindex | jobid | pathid | deltaseq | markid | fhinfo
>>> | fhnode | lstat |
>>> md5 | name | decode_lstat
>>> |
>>>
>>> +-------------+-----------+-------+---------+----------+--------+--------+--------+---------------------------------------------------------------+------------------------+-----------------------------+-------------------------------------------------------------------------------------------+
>>> | 484,575,604 | 3 | 94511 | 910,901 | 0 | 0 | 0
>>> | 0 | P0D CgA5 IGk B A i A BA9gA BAA IHw BlxAt8 BlxASH BlxVYH A A d |
>>> J9h+OdR/6XCqhqpJdJvYtw | backup_databases_1100CC.tar |
>>> (64771,655417,33188,1,0,34,0,17029120,4096,33264,1707346812,1707345031,1707431431,0,0,29)
>>>
>>> |
>>>
>>> +-------------+-----------+-------+---------+----------+--------+--------+--------+---------------------------------------------------------------+------------------------+-----------------------------+-------------------------------------------------------------------------------------------+
>>>
>>>
>>> Restore of this job (jobid=94511) works just fine:
>>>
>>> 23-Feb 13:44 bareos-sd JobId 94559: Releasing device
>>> "FileDevice-ReadOnly-0005" (/backup_1/bareos).
>>> 23-Feb 13:44 bareos-dir JobId 94559: Max configured use duration=82,800
>>> sec. exceeded. Marking Volume "AI-Consolidated-1771" as Used.
>>> 23-Feb 13:44 bareos-dir JobId 94559: Bareos bareos-dir
>>> 23.0.2~pre32.0a0e55739 (31Jan24):
>>> Build OS: Ubuntu 20.04.5 LTS
>>> JobId: 94559
>>> Job: Restore-Files.2024-02-23_13.43.47_38
>>> Restore Client: "bareos-fd" 23.0.2~pre32.0a0e55739 (31Jan24)
>>> Ubuntu 20.04.5 LTS,ubuntu
>>> Start time: 23-Feb-2024 13:43:49
>>> End time: 23-Feb-2024 13:44:20
>>> Elapsed time: 31 secs
>>> Files Expected: 33
>>> Files Restored: 33
>>> Bytes Restored: 9,143,514,060
>>> Rate: 294952.1 KB/s
>>> FD Errors: 0
>>> FD termination status: OK
>>> SD termination status: OK
>>> Bareos binary info: Bareos community build (UNSUPPORTED): Get
>>> professional support from https://www.bareos.com
>>> Job triggered by: User
>>> Termination: Restore OK
>>>
>>> I'll also try restoring its copy from tape once it has been copied.
>>>
>>> Best regards,
>>> Samuel
>>> On Friday, February 23, 2024 at 1:16:32 PM UTC+1 Sebastian Sura wrote:
>>>
>>>> I meant to say File 3 has no stream 1, sorry for the confusion!
>>>> Am 23.02.24 um 13:15 schrieb Sebastian Sura:
>>>>
>>>> Hi Samuel
>>>>
>>>> thanks for the gathering this info. Ill look into the bscan issue as
>>>> well though i think this might be a known issue.
>>>> Regardless, regarding your actual issue: The bscan output shows that
>>>> File 3 is missing its attributes (it has no stream 3).
>>>> This confuses the director as it never got told that file 3 now started
>>>> getting backed up and this is why you get the warning
>>>> message from the director.
>>>>
>>>> The same is happening during the restore: the filedaemon never got told
>>>> that now a new file was started (because it never got the attribute stream)
>>>> so it basically merged both File 2 and File 3 into one file. Even with
>>>> this in mind, i think the restore should have caught that and issues a
>>>> warning.
>>>> Ill look into why this did not happen.
>>>>
>>>> If you add up all the data records (stream=2) with fileid=2 and
>>>> fileid=3, you will get 738212473 bytes, which is exactly what the
>>>> filedaemon reported
>>>> as size for file 2. Since the log contains the size it expected i
>>>> imagine that you could manually restore file 3 by splitting file 2 into
>>>> two.
>>>>
>>>> Can you check which file fileid 3 corresponds to ? Is it a special kind
>>>> of file or just a normal one ?
>>>>
>>>> Ill try to see if i can reproduce your issue in the copy system test.
>>>> Do you do a local copy (so copy to the same sd) or a remote one?
>>>>
>>>> Kind Regards
>>>>
>>>> Sebastian Sura
>>>> Am 23.02.24 um 12:56 schrieb 'Samuel' via bareos-users:
>>>>
>>>> Hi Sebastian,
>>>>
>>>> After looking at the logs some more I noticed that all jobs with this
>>>> warning have in common that they're using compression (LZ4) and autoxflate
>>>> on replication.
>>>> Perhaps this combination is still not entirely fixed yet. I'll try
>>>> disabling autoxflate in the coming days.
>>>>
>>>> The entire result of bscan is in the attachment. However here's the
>>>> last part of it where bscan seems to abort (?):
>>>> $ sudo -u bareos bscan -b md5_digest_error_copy_only.bsr --list-records
>>>> TapeDevice2 2>&1 | tee records.txt
>>>> ...
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=23 Stream=2 len=65536
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=23 Stream=2 len=65536
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=23 Stream=2 len=65536
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=23 Stream=2 len=22528
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=23 Stream=3 len=16
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=24 Stream=1 len=100
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=25 Stream=1 len=83
>>>> bscan: stored/bscan.cc:494-0 Record: SessId=3480 SessTim=1707222936
>>>> FileIndex=-5 Stream=94145 len=193
>>>> bscan: stored/bscan.cc:681-0 Could not find SessId=3480
>>>> SessTime=1707222936 for EOS record.
>>>> Records would have been added or updated in the catalog:
>>>> 0 Media
>>>> 1 Pool
>>>> 0 Job
>>>> 0 File
>>>> 0 RestoreObject
>>>> 23-Feb 12:05 bscan JobId 0: Releasing device "TapeDevice2"
>>>> (/dev/tape/by-id/scsi-35000e111c71ac0bf-nst).
>>>>
>>>> I also tried restoring the copy which ends in error.
>>>>
>>>> Most files are restored successfully except for one file which happens
>>>> to have FileIndex=2 (corresponding to the `attributes=2` in the warning?):
>>>> Enter SQL query: select path,name from file,path where jobid=94146 and
>>>> fileindex=2 and file.pathid=path.pathid;
>>>> +-----------------------------------+--------------------------+
>>>> | path | name |
>>>> +-----------------------------------+--------------------------+
>>>> | /var/backup/old/2024-02-07_23:30/ | backup_www_1100CC.tar.gz |
>>>> +-----------------------------------+--------------------------+
>>>>
>>>> A full restore of the original non-copy backup jobs works fine, as do
>>>> restores of other copy jobs on tape of filesets that don't use compression.
>>>> The original job that was copied (93682) doesn't exist anymore; I think
>>>> it was an always-incremental consolidated incremental backup which got
>>>> consolidated again the following day.
>>>>
>>>> Best regards,
>>>> Samuel
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "bareos-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/bareos-users/e766e8c6-5b58-424f-816f-f7c5d7252180n%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/bareos-users/e766e8c6-5b58-424f-816f-f7c5d7252180n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> --
>>>> Sebastian Sura [email protected]
>>>> Bareos GmbH & Co. KG Phone: +49 221 630693-0
>>>> https://www.bareos.com
>>>> Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
>>>> Komplementär: Bareos Verwaltungs-GmbH
>>>> Geschäftsführer: Stephan Dühr, Jörg Steffens, Philipp Storz
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "bareos-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/bareos-users/111cecfe-d3e1-4eb9-8a0e-f053a2777d0f%40bareos.com
>>>>
>>>> <https://groups.google.com/d/msgid/bareos-users/111cecfe-d3e1-4eb9-8a0e-f053a2777d0f%40bareos.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> --
>>>> Sebastian Sura [email protected]
>>>> Bareos GmbH & Co. KG Phone: +49 221 630693-0
>>>> https://www.bareos.com
>>>> Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
>>>> Komplementär: Bareos Verwaltungs-GmbH
>>>> Geschäftsführer: Stephan Dühr, Jörg Steffens, Philipp Storz
>>>>
>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "bareos-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/bareos-users/3823a22d-b49e-4892-8b45-6c932ce7e699n%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/bareos-users/3823a22d-b49e-4892-8b45-6c932ce7e699n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
>> Sebastian Sura [email protected]
>> Bareos GmbH & Co. KG Phone: +49 221 630693-0
>> https://www.bareos.com
>> Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
>> Komplementär: Bareos Verwaltungs-GmbH
>> Geschäftsführer: Stephan Dühr, Jörg Steffens, Philipp Storz
>>
>>
--
You received this message because you are subscribed to the Google Groups
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bareos-users/67455f46-6182-4fb6-9be1-e8969ae0fb3dn%40googlegroups.com.