I don't think I would try squashing anything. The full and accurate git history will likely be wanted by the IP clearance process. A tarball will be the result of that process, and we can add it to the get repo in one commit. Trying to maintain git history just complicates things for I think not much benefit, especially if things like bug/PR ID's in commit messages are incorrect.
It's more important that files in the repo have correct license headers add to make it clear the who the copyright holder is of all the files. On 9/9/21 9:44 PM, John Wass wrote: > I had a few more (6) source files as modified.. > > extension.ts > debugAdapter.ts > daffodilRuntime.ts > daffodilDebug.ts > adapter.test.ts > activateDaffodilDebug.ts > >> It would seem an IDE (probably vscode!) decided to restyle/reindent this > code. > > We added opinionated code formatting... apparently trying to make this > process as hard as possible :/ > > That reformat commit was done on 08/25/2021, title of PR was Prettier. > Looking prior to that commit might give a little better idea of what > changed. > > >> squash/rebase can be difficut through merges > > Here is a quick pass on (1) squashing the MS source in a single commit (2) > placing that commit on top of an init commit in a repo (2) and then > rewriting out commits on top of all of that. > > It preserves our authorship. Can be cleaned up a little bit still but I am > not going to put time into it if we don't want this. I just wanted to note > how it could look. > > https://github.com/jw3/rewrite-daffodil-vscode-1 > > One issue I could see here is the linking of the example repo PR IDs in the > commit messages will conflict once we start adding PRs in the new repo. > Now would be the time to rewrite these commit messages and strip/modify > those #ID tags. > > Thoughts on that rewrite repo? > > > > > > On Thu, Sep 9, 2021 at 5:42 PM Beckerle, Mike <mbecke...@owlcyberdefense.com> > wrote: > >> So via some git trickery I was able to determine the "blended" files. >> >> I'm ignoring the various configuration files which are generally json >> files. >> >> Of the ".ts" files only 3 are blended: >> >> src/debugAdapter.ts - 72 lines - only maybe 6 lines are different >> src/extension.ts - 179 lines >> src/tests/adapter.test.ts - 137 lines (50 of which are commented-out code) >> >> The delta between these files and the original files of the same name are >> larger than expected due to changes in whitespace, and removal of ";" at >> end of line (which I guess are optional in many places in typescript). >> >> It would seem an IDE (probably vscode!) decided to restyle/reindent this >> code. >> >> So it's a bit hard to figure out what the "real" deltas are. >> >> src/debugAdapter.ts appears to be only trivially different. The name >> MockDebugSession was replaced by DaffodilDebugSession, and "./mockDebug" >> was changed to "./daffodilDebug". >> >> The other two files do appear to be where all the real blended code is. >> >> >> >> ________________________________ >> From: Beckerle, Mike <mbecke...@owlcyberdefense.com> >> Sent: Thursday, September 9, 2021 4:21 PM >> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >> Subject: Re: daffodil-vscode - how to package and identify the >> contribution - some git questions >> >> Whether it's a PR or series of PRs, or a software grant, that still >> doesn't resolve the issue of the blended files which are part MIT-licensed >> original code, and part new code deltas by the daffodil-vscode contributors. >> >> We need to understand whether those blended files can be teased apart >> somehow so that it is clear going forward what is an MIT-licensed library >> and what is Apache Licensed. >> >> I just did a grep -R -i microsoft in a clone of the >> openwhisk-vscode-extension and got zero hits. So no files still carry >> microsoft copyright and in fact their NOTICES.txt file does not indicate >> any dependency on MIT-licensed code at all. So I think >> openwhisk-vscode-extension is not going to help us figure out how to surf >> this issue. >> >> >> ________________________________ >> From: Steve Lawrence <slawre...@apache.org> >> Sent: Thursday, September 9, 2021 3:54 PM >> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >> Subject: Re: daffodil-vscode - how to package and identify the >> contribution - some git questions >> >> The concern is that this code was developed outside of Apache and so >> didn't follow standard Apache process. From the IP clearance page: >> >> https://incubator.apache.org/ip-clearance/ >> >>> Any code that was developed outside of the ASF SVN repository and >>> our public mailing lists must be processed like this, even if the >>> external developer is already an ASF committer. >> >> I suppose that submitting it as a PR does follow some of that process, >> but there is maybe less assurance of ownership. Because it was not >> developed in an ASF repository, that code is presumed to be owned by >> you, multiple developers, or a company, and so that ownership must be >> granted to ASF via the IP clearance process, with appropriate software >> grant, CLA's, etc. (At least, that's my admittedly limited understanding >> of the process). >> >> - Steve >> >> >> On 9/9/21 3:34 PM, John Wass wrote: >>> Couldn't we (the vscode contributors) submit a series of PRs against the >>> new repo to move the code, and just archive the example repo as-is? >>> >>> I noted some thoughts on that a while back >>> https://github.com/jw3/example-daffodil-vscode/issues/77 >>> >>> >>> >>> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike < >> mbecke...@owlcyberdefense.com> >>> wrote: >>> >>>> I know of one file in the repo which will have to be removed which is >> the >>>> jpeg.dfdl.xsd file, which is there just as an example workspace. >>>> >>>> The copyright and provisions of that are not compatible with Apache >>>> licensing. >>>> >>>> We can find a DFDL schema that we created that has Apache license to use >>>> instead. >>>> >>>> For the other files under src, server, and build, can we generate a list >>>> of files identifying which are: >>>> >>>> (a) original MIT-licensed, unmodified >>>> (b) new - can be ASL >>>> (c) blended - started from MIT-licensed source, modified with >>>> daffodil-vscode-specific changes. >>>> >>>> It is these blended files that are the problematic ones. >>>> >>>> >>>> >>>> ________________________________ >>>> From: Steve Lawrence <slawre...@apache.org> >>>> Sent: Thursday, September 9, 2021 1:38 PM >>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>> Subject: Re: daffodil-vscode - how to package and identify the >>>> contribution - some git questions >>>> >>>> Correct. For more information about Apache license compatibility: >>>> >>>> https://www.apache.org/legal/resolved.html >>>> >>>> MIT is Category A and is fine. EPL is Category B and is also okay, but >>>> generally only in its binary form. So these top-level dependencies look >>>> okay, assuming their transitive dependencies are also okay. >>>> >>>> We'll also need to verify the licenses of all code in the repo. >>>> Hopefully little of that is original microsoft MIT and can be granted to >>>> ASF and relicensed. >>>> >>>> >>>> On 9/9/21 1:30 PM, Beckerle, Mike wrote: >>>>> The requirement, is that the entire dependency tree (transitively) >>>> cannot depend on any software that has an Apache-incompatible (aka >>>> restrictive) license. >>>>> >>>>> So we need the transitive closure of all dependencies. >>>>> >>>>> >>>>> ________________________________ >>>>> From: Adam Rosien <a...@rosien.net> >>>>> Sent: Thursday, September 9, 2021 12:44 PM >>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>>> Subject: Re: daffodil-vscode - how to package and identify the >>>> contribution - some git questions >>>>> >>>>> (I don't understand the requirements of licencing + transitive >>>>> dependencies, so I'm giving some surface level license info) >>>>> >>>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL >>>>> http://logback.qos.ch/license.html >>>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL >>>> 1.0 >>>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT >>>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0 >>>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0 >>>>> >>>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <a...@rosien.net> wrote: >>>>> >>>>>> I can relay the list of dependencies and their licenses. >>>>>> >>>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <slawre...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> I personally don't care too much about having the existing git >> history >>>>>>> once its part of ASF, especially if it makes thing any easier (as you >>>>>>> mention, squash/rebase can be difficut through merges). So I'd say we >>>>>>> just do plan B--create a tarball of the current state (without the >> git >>>>>>> history), and the content of that tarball is what goes through the IP >>>>>>> clearance process, and is the content of the inital commit when >> adding >>>>>>> to the apache/daffodil-vscode repo. >>>>>>> >>>>>>> Note that I think the incubator will still want access to the >> existing >>>>>>> repo so they can view the full git history. Understanding where >>>>>>> everything came from and verifying the provenance is important to >>>>>>> ensuring we have all the appropriate CLA's. So while the tarball is >>>>>>> maybe what is officially voted on, they will want access to the repo. >>>>>>> >>>>>>> That said, I don't think we are going to get CLA's for any Microsoft >>>>>>> contribute code. So either all Microsoft contributed code will need >> to >>>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to >>>>>>> grant something to ASF where the original codebase stays MIT and >> isn't >>>>>>> part of that grant. >>>>>>> >>>>>>> I think understanding how much code still exists that is >> Microsoft/MIT >>>>>>> is going to be important to getting this through the IP clearance >>>> process. >>>>>>> >>>>>>> So I'm curious how much of that original Microsoft code still >> exists? I >>>>>>> assume since it was just example code it has mostly been replaced? If >>>>>>> that's the case, we could potentially say Microsoft has no ownership >> of >>>>>>> this code, and so their CLA and MIT license aren't necessary? >>>>>>> >>>>>>> We should also have a good understanding of the dependencies. If any >> of >>>>>>> them are not compatible with ALv2, then going through this process >>>> isn't >>>>>>> even worth it until they are replaced. Do you have a list of the >>>>>>> dependencies? >>>>>>> >>>>>>> >>>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote: >>>>>>>> So the daffodil-vscode code-base wants to be granted to become part >> of >>>>>>> the >>>>>>>> Daffodil project. >>>>>>>> >>>>>>>> One question arises which is "what is the contribution?" exactly. >>>>>>>> >>>>>>>> The normal way this is identified is by creating a tarball of the >>>>>>> source files >>>>>>>> and specifying an sha or md5 hash of that file. >>>>>>>> >>>>>>>> However, this code base is perhaps different from usual. >>>>>>>> >>>>>>>> It started by creating a detached fork of the vscode debugger >> example >>>>>>> code base. >>>>>>>> This is MIT-Licensed which is a compatible license. >>>>>>>> >>>>>>>> The files are then edited. There are around 100 commits on top of >> the >>>>>>> base that >>>>>>>> came from the vscode debugger repository. >>>>>>>> >>>>>>>> So the contribution is that set of 100 commits - the >>>>>>> patches/change-sets they >>>>>>>> represent. >>>>>>>> >>>>>>>> These commits often edit the original files of the vscode debugger >>>>>>> example to >>>>>>>> add the daffodil-specific functionality. That is, the contribution >>>>>>> material is >>>>>>>> in several cases intermingled in the lines of the existing files. >>>>>>> That's ok I >>>>>>>> think so long as the modified file had MIT license. >>>>>>>> >>>>>>>> There's some value in preserving the 100 commits by our >> contributors, >>>>>>> not >>>>>>>> squashing it down to one commit, though if it's really not sensible >> to >>>>>>> proceed >>>>>>>> otherwise, we can choose to squash it down to one commit. >>>>>>>> >>>>>>>> Furthermore, the vscode debugger example repo itself had many >> commits >>>>>>> in it. The >>>>>>>> current daffodil-vscode repo preserves all these commits as well. I >>>>>>> don't see >>>>>>>> value in preserving these commits, and would rather they were >> squashed >>>>>>> into a >>>>>>>> single "starting point" commit, with a dependencies file specifying >>>> the >>>>>>> githash >>>>>>>> where we forked from, just so we can refer back if necessary. >>>>>>>> >>>>>>>> So as a starting suggestion (subject to discussion of other >>>>>>> alternatives) is this: >>>>>>>> >>>>>>>> Plan A: >>>>>>>> >>>>>>>> 1. squash all commits up to and including the last Microsoft >> commit, >>>>>>> together >>>>>>>> into one. >>>>>>>> 2. rebase the remaining commits on top of that. >>>>>>>> 1. I'm a bit worried about this rebase. There are merge >> commits, >>>>>>> etc. in >>>>>>>> the history. I'm not sure this will just all rebase while >>>>>>> preserving all >>>>>>>> the commits, but maybe it will "just work" >>>>>>>> 3. create a "patch set" corresponding to the 100 or so commits that >>>>>>> make up the >>>>>>>> "contribution". >>>>>>>> 1. I don't know if this is even feasible for this many commits. >>>>>>>> 4. create a tar/zip of this aggregate patch set. >>>>>>>> 5. compute an md5 of this patch set. >>>>>>>> >>>>>>>> The patch set tar/zip file and its md5 hash are "the granted >>>> software". >>>>>>>> >>>>>>>> The problem with this idea is that there's no obvious way to review >> a >>>>>>> patch set, >>>>>>>> shy of applying it. >>>>>>>> >>>>>>>> A better way may be to change steps 3 - 5 above to >>>>>>>> >>>>>>>> Plan B: >>>>>>>> >>>>>>>> 3. push the main branch to a new empty git repository >>>>>>>> The point of this is to remove all historic stuff from the >>>>>>> repository, >>>>>>>> i.e., have a minimal git repo that contains only the >> contribution >>>>>>> and the >>>>>>>> single other commit it must be based on. >>>>>>>> >>>>>>>> 4. create a tarball of this git repository, and md5 hash of it >>>>>>>> >>>>>>>> 5. document that the contribution is from githash X (after the >>>>>>> first commit) >>>>>>>> to githash Y (the final commit) of this repository >>>>>>>> >>>>>>>> >>>>>>>> This has the advantage that the contribution is a self-contained >>>>>>> review-able thing. >>>>>>>> >>>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I >>>> know >>>>>>> of are: >>>>>>>> >>>>>>>> 1. a single file containing the contribution, and its md5 hash >>>>>>>> 2. a sensible way one can review the contents of this contribution >>>> file >>>>>>>> 3. preserve history of derivation from the vscode debugger example. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Mike Beckerle | Principal Engineer >>>>>>>> >>>>>>>> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com> >>>>>>>> >>>>>>>> P +1-781-330-0412 >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> >> >