Giannis Economou wrote at about 19:12:56 +0200 on Thursday, March 16, 2023:
> In my v4 pool are collisions still generating _0, _1, _2 etc filenames
> in the pool/ ?
According to the code in Lib.pm, it appears that unlike v3, there is
no underscore -- it's just an (unsigned) long added to the end of the
16 byte digest.
>
> (as in the example from the docs mentions:
> __TOPDIR__/pool/1/2/3/123456789abcdef0
> __TOPDIR__/pool/1/2/3/123456789abcdef0_0
> __TOPDIR__/pool/1/2/3/123456789abcdef0_1
> )
That is for v3 as indicated by the 3-layer pool.
>
> I am using compression (I only have cpool/ dir) and I am asking because
> on both servers running:
> find cpool/ -name "*_0" -print
> find cpool/ -name "*_*" -print
>
> brings zero results.
Try:
find /var/lib/backuppc/cpool/ -type f -regextype grep ! -regex
".*/[0-9a-f]\{32\}" ! -name "LOCK" ! -name "poolCnt"
>
>
> Thank you.
>
>
> On 16/3/2023 6:30 μ.μ., [email protected] wrote:
> > Rob Sheldon wrote at about 08:31:17 -0700 on Thursday, March 16, 2023:
> > > On Thu, Mar 16, 2023, at 7:43 AM, [email protected] wrote:
> > > >
> > > > Rob Sheldon wrote at about 23:54:51 -0700 on Wednesday, March 15,
> > 2023:
> > > > > There is no reason to be concerned. This is normal.
> > > >
> > > > It *should* be extremely, once-in-a-blue-moon, rare to randomly have
> > an
> > > > md5sum collision -- as in 1.47*10^-29
> > >
> > > Why are you assuming this is "randomly" happening? Any time an
> > identical file exists in more than one place on the client filesystem,
> > there will be a collision. This is common in lots of cases. Desktop
> > environments frequently have duplicated files scattered around. I used
> > BackupPC for website backups; my chain length was approximately equal to
> > the number of WordPress sites I was hosting.
> >
> > You are simply not understanding how file de-duplication and pool
> > chains work in v4.
> >
> > Identical files contribute only a single chain instance -- no matter
> > how many clients you are backing up and no matter how many backups you
> > save of each client. This is what de-duplication does.
> >
> > The fact that they appear on different clients and/or in different
> > parts of the filesystem is reflected in the attrib files in the pc
> > subdirectories for each client. This is where the metadata is stored.
> >
> > Chain lengths have to do with pool storage of the file contents
> > (ignoring metadata). Lengths greater than 1 only occur if you have
> > md5sum hash collisions -- i.e., two files (no matter on what client or
> > where in the filesystem) with non-identical contents but the same
> > md5sum hash.
> >
> > Such collisions are statistically exceedingly unlikely to occur on
> > normal data where you haven't worked hard to create such collisions.
> >
> > For example, on my backup server:
> > Pool is 841.52+0.00GiB comprising 7395292+0 files and 16512+1
> > directories (as of 2023-03-16 01:11),
> > Pool hashing gives 0+0 repeated files with longest chain 0+0,
> >
> > I strongly suggest you read the documentation on BackupPC before
> > making wildly erroneous assumptions about chains. You can also look at
> > the code in BackupPC_refCountUpdate which defines how $fileCntRep and
> > $fileCntRepMax are calculated.
> >
> > Also, if what you said were true, the OP would have multiple chains -
> > presumably one for each distinct file that is "scattered around"
> >
> > If you are using v4.x and have pool hashing with such collisions, it
> > would be great to see them. I suspect you are either using v3 or you
> > are using v4 with a legacy v3 pool
> >
> > > > You would have to work hard to artificially create such collisions.
> > >
> > > $ echo 'hello world' > ~/file_a
> > > $ cp ~/file_a ~/file_b
> > > $ [ "$(cat ~/file_a | md5sum)" = "$(cat ~/file_b | md5sum)" ] && echo
> > "MATCH"
> > >
> > > _</email>_
> > > Rob Sheldon
> > > Contract software developer, devops, security, technical lead
> > >
> > >
> > > _______________________________________________
> > > BackupPC-users mailing list
> > > [email protected]
> > > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> > > Wiki: https://github.com/backuppc/backuppc/wiki
> > > Project: https://backuppc.github.io/backuppc/
> >
> >
> > _______________________________________________
> > BackupPC-users mailing list
> > [email protected]
> > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> > Wiki: https://github.com/backuppc/backuppc/wiki
> > Project: https://backuppc.github.io/backuppc/
>
>
> _______________________________________________
> BackupPC-users mailing list
> [email protected]
> List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki: https://github.com/backuppc/backuppc/wiki
> Project: https://backuppc.github.io/backuppc/
_______________________________________________
BackupPC-users mailing list
[email protected]
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/