Re: daffodil-vscode - how to package and identify the contribution - some git questions

Steve Lawrence Fri, 10 Sep 2021 04:42:25 -0700

I think the concern from ASF, and why they have this IP clearance
process, is that the copyright ownership of these files is not clear. It
wasn't done in a fork of a Daffodil repo, and there are contributions
from multiple developers, with little public oversight from Daffodil. I
think from the ASF's perspective, this code did not follow the ASF
process, and is assumed to be owned by the contributors or their
companies. There is no license/copyright in the code specifying
otherwise, so I think ASF must assume the worst, and require a software
grant.


Also note that even prototype code has a copyright owner and a license.
Copying it into a PR doesn't change that. If you were to throw away this
code and start from scratch following the ASF process, then it wouldn't
be a problem. But if the plan is to copy prototype code not owned by ASF
into a PR, then there are ownership concerns.

If all this work was done in a fork of the apache/daffodil-vscode repo
from a single contributor, then I think maybe the assumption from ASF is
the code was intended to be part of the main repo and implicitly granted
to ASF via the PR process.


On 9/9/21 4:05 PM, John Wass wrote:
> Yeah I was thinking of the example repo as a prototype, just as if I was
> working on a feature in my fork of Daffodil.  The main project doesn't own
> the feature until it crosses the PR threshold, and once it does cross over
> the state of my fork is of no concern to it.
> 
> 
> 
> On Thu, Sep 9, 2021 at 3:54 PM Steve Lawrence <slawre...@apache.org> wrote:
> 
>> The concern is that this code was developed outside of Apache and so
>> didn't follow standard Apache process. From the IP clearance page:
>>
>> https://incubator.apache.org/ip-clearance/
>>
>>> Any code that was developed outside of the ASF SVN repository and
>>> our public mailing lists must be processed like this, even if the
>>> external developer is already an ASF committer.
>>
>> I suppose that submitting it as a PR does follow some of that process,
>> but there is maybe less assurance of ownership. Because it was not
>> developed in an ASF repository, that code is presumed to be owned by
>> you, multiple developers, or a company, and so that ownership must be
>> granted to ASF via the IP clearance process, with appropriate software
>> grant, CLA's, etc. (At least, that's my admittedly limited understanding
>> of the process).
>>
>> - Steve
>>
>>
>> On 9/9/21 3:34 PM, John Wass wrote:
>>> Couldn't we (the vscode contributors) submit a series of PRs against the
>>> new repo to move the code, and just archive the example repo as-is?
>>>
>>> I noted some thoughts on that a while back
>>> https://github.com/jw3/example-daffodil-vscode/issues/77
>>>
>>>
>>>
>>> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
>> mbecke...@owlcyberdefense.com>
>>> wrote:
>>>
>>>> I know of one file in the repo which will have to be removed which is
>> the
>>>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>>>
>>>> The copyright and provisions of that are not compatible with Apache
>>>> licensing.
>>>>
>>>> We can find a DFDL schema that we created that has Apache license to use
>>>> instead.
>>>>
>>>> For the other files under src, server, and build, can we generate a list
>>>> of files identifying which are:
>>>>
>>>> (a) original MIT-licensed, unmodified
>>>> (b) new - can be ASL
>>>> (c) blended - started from MIT-licensed source, modified with
>>>> daffodil-vscode-specific changes.
>>>>
>>>> It is these blended files that are the problematic ones.
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Steve Lawrence <slawre...@apache.org>
>>>> Sent: Thursday, September 9, 2021 1:38 PM
>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>> contribution - some git questions
>>>>
>>>> Correct. For more information about Apache license compatibility:
>>>>
>>>>   https://www.apache.org/legal/resolved.html
>>>>
>>>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>>>> generally only in its binary form. So these top-level dependencies look
>>>> okay, assuming their transitive dependencies are also okay.
>>>>
>>>> We'll also need to verify the licenses of all code in the repo.
>>>> Hopefully little of that is original microsoft MIT and can be granted to
>>>> ASF and relicensed.
>>>>
>>>>
>>>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>>>> The requirement, is that the entire dependency tree (transitively)
>>>> cannot depend on any software that has an Apache-incompatible (aka
>>>> restrictive) license.
>>>>>
>>>>> So we need the transitive closure of all dependencies.
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Adam Rosien <a...@rosien.net>
>>>>> Sent: Thursday, September 9, 2021 12:44 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>> contribution - some git questions
>>>>>
>>>>> (I don't understand the requirements of licencing + transitive
>>>>> dependencies, so I'm giving some surface level license info)
>>>>>
>>>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>>>> http://logback.qos.ch/license.html
>>>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>>>> 1.0
>>>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>>>
>>>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <a...@rosien.net> wrote:
>>>>>
>>>>>> I can relay the list of dependencies and their licenses.
>>>>>>
>>>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <slawre...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I personally don't care too much about having the existing git
>> history
>>>>>>> once its part of ASF, especially if it makes thing any easier (as you
>>>>>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>>>>>> just do plan B--create a tarball of the current state (without the
>> git
>>>>>>> history), and the content of that tarball is what goes through the IP
>>>>>>> clearance process, and is the content of the inital commit when
>> adding
>>>>>>> to the apache/daffodil-vscode repo.
>>>>>>>
>>>>>>> Note that I think the incubator will still want access to the
>> existing
>>>>>>> repo so they can view the full git history. Understanding where
>>>>>>> everything came from and verifying the provenance is important to
>>>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>>>> maybe what is officially voted on, they will want access to the repo.
>>>>>>>
>>>>>>> That said, I don't think we are going to get CLA's for any Microsoft
>>>>>>> contribute code. So either all Microsoft contributed code will need
>> to
>>>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>>>>>> grant something to ASF where the original codebase stays MIT and
>> isn't
>>>>>>> part of that grant.
>>>>>>>
>>>>>>> I think understanding how much code still exists that is
>> Microsoft/MIT
>>>>>>> is going to be important to getting this through the IP clearance
>>>> process.
>>>>>>>
>>>>>>> So I'm curious how much of that original Microsoft code still
>> exists? I
>>>>>>> assume since it was just example code it has mostly been replaced? If
>>>>>>> that's the case, we could potentially say Microsoft has no ownership
>> of
>>>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>>>
>>>>>>> We should also have a good understanding of the dependencies. If any
>> of
>>>>>>> them are not compatible with ALv2, then going through this process
>>>> isn't
>>>>>>> even worth it until they are replaced. Do you have a list of the
>>>>>>> dependencies?
>>>>>>>
>>>>>>>
>>>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>>>> So the daffodil-vscode code-base wants to be granted to become part
>> of
>>>>>>> the
>>>>>>>> Daffodil project.
>>>>>>>>
>>>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>>>
>>>>>>>> The normal way this is identified is by creating a tarball of the
>>>>>>> source files
>>>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>>>
>>>>>>>> However, this code base is perhaps different from usual.
>>>>>>>>
>>>>>>>> It started by creating a detached fork of the vscode debugger
>> example
>>>>>>> code base.
>>>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>>>
>>>>>>>> The files are then edited. There are around 100 commits on top of
>> the
>>>>>>> base that
>>>>>>>> came from the vscode debugger repository.
>>>>>>>>
>>>>>>>> So the contribution is that set of 100 commits - the
>>>>>>> patches/change-sets they
>>>>>>>> represent.
>>>>>>>>
>>>>>>>> These commits often edit the original files of the vscode debugger
>>>>>>> example to
>>>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>>>> material is
>>>>>>>> in several cases intermingled in the lines of the existing files.
>>>>>>> That's ok I
>>>>>>>> think so long as the modified file had MIT license.
>>>>>>>>
>>>>>>>> There's some value in preserving the 100 commits by our
>> contributors,
>>>>>>> not
>>>>>>>> squashing it down to one commit, though if it's really not sensible
>> to
>>>>>>> proceed
>>>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>>>
>>>>>>>> Furthermore, the vscode debugger example repo itself had many
>> commits
>>>>>>> in it. The
>>>>>>>> current daffodil-vscode repo preserves all these commits as well. I
>>>>>>> don't see
>>>>>>>> value in preserving these commits, and would rather they were
>> squashed
>>>>>>> into a
>>>>>>>> single "starting point" commit, with a dependencies file specifying
>>>> the
>>>>>>> githash
>>>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>>>
>>>>>>>> So as a starting suggestion (subject to discussion of other
>>>>>>> alternatives) is this:
>>>>>>>>
>>>>>>>> Plan A:
>>>>>>>>
>>>>>>>>  1. squash all commits up to and including the last Microsoft
>> commit,
>>>>>>> together
>>>>>>>>     into one.
>>>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>>>      1. I'm a bit worried about this rebase. There are merge
>> commits,
>>>>>>> etc. in
>>>>>>>>         the history. I'm not sure this will just all rebase while
>>>>>>> preserving all
>>>>>>>>         the commits, but maybe it will "just work"
>>>>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>>>>>> make up the
>>>>>>>>     "contribution".
>>>>>>>>      1. I don't know if this is even feasible for this many commits.
>>>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>>>  5. compute an md5 of this patch set.
>>>>>>>>
>>>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>>>> software".
>>>>>>>>
>>>>>>>> The problem with this idea is that there's no obvious way to review
>> a
>>>>>>> patch set,
>>>>>>>> shy of applying it.
>>>>>>>>
>>>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>>>
>>>>>>>> Plan B:
>>>>>>>>
>>>>>>>>     3. push the main branch to a new empty git repository
>>>>>>>>          The point of this is to remove all historic stuff from the
>>>>>>> repository,
>>>>>>>>     i.e., have a minimal git repo that contains only the
>> contribution
>>>>>>> and the
>>>>>>>>     single other commit it must be based on.
>>>>>>>>
>>>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>>>
>>>>>>>>     5. document that the contribution is from githash X (after the
>>>>>>> first commit)
>>>>>>>>     to githash Y (the final commit) of this repository
>>>>>>>>
>>>>>>>>
>>>>>>>> This has the advantage that the contribution is a self-contained
>>>>>>> review-able thing.
>>>>>>>>
>>>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>>>> know
>>>>>>> of are:
>>>>>>>>
>>>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>>>  2. a sensible way one can review the contents of this contribution
>>>> file
>>>>>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Mike Beckerle | Principal Engineer
>>>>>>>>
>>>>>>>> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>>>>>>>>
>>>>>>>> P +1-781-330-0412
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Reply via email to