Yaroslav Halchenko wrote:
> Original motivation was expressed here:
> https://github.com/data-git/datagit/issues/1
> 
> Through-away limited-view (i.e. some keys not present) could be useful
> e.g. to test for correct operation of a given data analysis pipeline
> given a subset of files (which were 'get'ed)

I think that if it's safe to hard-link in this situation, it could do it
by default without a new option. Unless it requires expensive checks..

So, is it safe to hard-link in this situation? The complicating factors
I can think of are:

* Anything that changes the original file content would change the hard link
  too. But these are annexed objects; nothing should be changing their
  content.
* Hardlinked files share owner. So if the source and destination
  repositories are being used by different users, hard linking would not
  be a good idea.
* Hardlinked files also share permissions. So if core.shareRepository
  has different settings between the source and destination repositories,
  the files in them are supposed to have different modes and hard linking
  cannot be used.
* This could weaken numcopies enforcement. If the goal is to ensure 2 copies
  of a given file, then having both copies really be a single hard-linked
  file would result in a lower than desired redundancy.

Of course, git-annex already uses cp --reflink=auto in this case, so
if used on a filesystem that supports CoW, it'll already be as fast as a
hard link would be while avoiding all of the above complications (except
the numcopies one I suppose).

-- 
see shy jo

Attachment: signature.asc
Description: Digital signature

Reply via email to