On Tue, 31 Jan 2023 14:05:05 +0100 (CET) Andrea Calligaris <recl...@email.it> wrote:
Dear Andrea, > Do you think it should? > > I'm not interested in a short-term implementation, I'm more > interested if you think that it should, or if you have an opinion > against it. Im my opinion it should, because someone may rely on > hardlinks to manage its data, and if you want to backup your stuff in > a tar archive, with the current sbase version you are loosing the > hardlink connection and it just raw-copies the file. > > Or should one not rely on hardlinks at all, because X, etc. > > Let me know what you think, thanks. Just as a side-note: We are talking about support during "compression" here. sbase-tar already supports "decompression" of symbolic and hard links. I think sbase-tar should support hardlinks in general, but it adds complications that would require an elegant and scalable (!) solution. First off, detecting if a file is a hard link is POSIX compliant. The stat structure contains st_dev and st_ino (which gives a file a unique ID that you could use at compression time to keep tabs) and even st_nlink (which gives the number of hard links to a file, so you would only have to take tabs of those files with st_nlink > 1). It is also POSIX-tar-compliant (UStar, pax) to add hard links to archives. Still, there is no upper bound to the number of hard links you might encounter during "compression", and it would be ugly and inconsistent to impose an arbitrary limit on that (for a static buffer). You could alternatively just create and fill array of structs of the form struct { dev_t dev; ino_t ino; char *fname; } possible_hard_link_target; if you encounter afile with st_nlink > 1 that would then need to be checked for every subsequently encountered file with st_nlink > 1, but you would sacrifice the worst case memory consumption, which is currently O(1), to be O(n) in the case of all files being hard links, which in turn makes a naïve linear search on the array relatively inefficient. Instead, you could use a linked list, hash table or some binary tree over dev and ino, but this leads to cache misses and lots of pointer madness. In this case it might be much better to keep the array manually sorted (where you sort by "dev.ino", i.e. dev is the "major" part, and "ino" is the minor part) and then run a binary search, giving you logarithmic checking time. It might be a close call if shuffling your array and saving time in the search is better than having a stupid O(1)-addition-time at the cost of linear search time, but it shall not be forgotten how good CPUs have become at memmove(). No matter how you approach it though, you will have to go into the water and get your feet wet by sacrificing the constant memory consumption, but I think the gains are higher than the costs. With best regards Laslo [0]:https://pubs.opengroup.org/onlinepubs/009696799/basedefs/sys/stat.h.html