I personally don't care too much about having the existing git history once its part of ASF, especially if it makes thing any easier (as you mention, squash/rebase can be difficut through merges). So I'd say we just do plan B--create a tarball of the current state (without the git history), and the content of that tarball is what goes through the IP clearance process, and is the content of the inital commit when adding to the apache/daffodil-vscode repo.
Note that I think the incubator will still want access to the existing repo so they can view the full git history. Understanding where everything came from and verifying the provenance is important to ensuring we have all the appropriate CLA's. So while the tarball is maybe what is officially voted on, they will want access to the repo. That said, I don't think we are going to get CLA's for any Microsoft contribute code. So either all Microsoft contributed code will need to be kept MIT, or removed from the codebase. And if feels a bit odd to grant something to ASF where the original codebase stays MIT and isn't part of that grant. I think understanding how much code still exists that is Microsoft/MIT is going to be important to getting this through the IP clearance process. So I'm curious how much of that original Microsoft code still exists? I assume since it was just example code it has mostly been replaced? If that's the case, we could potentially say Microsoft has no ownership of this code, and so their CLA and MIT license aren't necessary? We should also have a good understanding of the dependencies. If any of them are not compatible with ALv2, then going through this process isn't even worth it until they are replaced. Do you have a list of the dependencies? On 9/9/21 11:16 AM, Beckerle, Mike wrote: > So the daffodil-vscode code-base wants to be granted to become part of the > Daffodil project. > > One question arises which is "what is the contribution?" exactly. > > The normal way this is identified is by creating a tarball of the source > files > and specifying an sha or md5 hash of that file. > > However, this code base is perhaps different from usual. > > It started by creating a detached fork of the vscode debugger example code > base. > This is MIT-Licensed which is a compatible license. > > The files are then edited. There are around 100 commits on top of the base > that > came from the vscode debugger repository. > > So the contribution is that set of 100 commits - the patches/change-sets they > represent. > > These commits often edit the original files of the vscode debugger example to > add the daffodil-specific functionality. That is, the contribution material > is > in several cases intermingled in the lines of the existing files. That's ok > I > think so long as the modified file had MIT license. > > There's some value in preserving the 100 commits by our contributors, not > squashing it down to one commit, though if it's really not sensible to > proceed > otherwise, we can choose to squash it down to one commit. > > Furthermore, the vscode debugger example repo itself had many commits in it. > The > current daffodil-vscode repo preserves all these commits as well. I don't see > value in preserving these commits, and would rather they were squashed into a > single "starting point" commit, with a dependencies file specifying the > githash > where we forked from, just so we can refer back if necessary. > > So as a starting suggestion (subject to discussion of other alternatives) is > this: > > Plan A: > > 1. squash all commits up to and including the last Microsoft commit, together > into one. > 2. rebase the remaining commits on top of that. > 1. I'm a bit worried about this rebase. There are merge commits, etc. in > the history. I'm not sure this will just all rebase while preserving > all > the commits, but maybe it will "just work" > 3. create a "patch set" corresponding to the 100 or so commits that make up > the > "contribution". > 1. I don't know if this is even feasible for this many commits. > 4. create a tar/zip of this aggregate patch set. > 5. compute an md5 of this patch set. > > The patch set tar/zip file and its md5 hash are "the granted software". > > The problem with this idea is that there's no obvious way to review a patch > set, > shy of applying it. > > A better way may be to change steps 3 - 5 above to > > Plan B: > > 3. push the main branch to a new empty git repository > The point of this is to remove all historic stuff from the > repository, > i.e., have a minimal git repo that contains only the contribution and the > single other commit it must be based on. > > 4. create a tarball of this git repository, and md5 hash of it > > 5. document that the contribution is from githash X (after the first > commit) > to githash Y (the final commit) of this repository > > > This has the advantage that the contribution is a self-contained review-able > thing. > > Other ideas are welcome. (Plans C, D, etc) The only requirements I know of > are: > > 1. a single file containing the contribution, and its md5 hash > 2. a sensible way one can review the contents of this contribution file > 3. preserve history of derivation from the vscode debugger example. > > > > > > > > > Mike Beckerle | Principal Engineer > > mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com> > > P +1-781-330-0412 >