Brandon,

I don’t know the exact policy. It’s certainly preferable that contributors 
include headers in their contribution. But, Matt points out that that raises 
the bar to contributions, so I think it's OK to defer them until release time.

(IANAL, but here’s my rationale: A contribution has the same legal status with 
and without headers. Code sitting in GitHub, even in the main branch, is not 
‘published’ in a legal sense until the release happens. But when the code is 
released, the headers need to be present remind people of their obligations 
under the license.)

Julian


> On Jun 13, 2021, at 5:19 AM, Brandon Jackson <[email protected]> wrote:
> 
> When is it important for these legal headers to be present?
> Example:
> Does every commit have to have them?
> Is this a clean-up before release exercise where right before a release a
> script could modify every contribution and add the required headers to the
> textual files?
> 
> Understanding that small point could yield a small utility on the client
> side, like "asf-bless.sh" which could parse the tree of files and add
> headers where any are missing before contributors commit and generate a
> pull request.
> 
> Brandon
> 
> 
> 
> On Sun, Jun 13, 2021 at 4:12 AM Matt Casters <[email protected]>
> wrote:
> 
>> You are both right of-course. I'm not sure why I didn't see it before but
>> we can simply add an option to add a header of choice defined in a file
>> somewhere to whichever file format we generate now (XML) or in the future.
>> Sure, it would not be visible to users of the GUI but it would solve the
>> conundrum.
>> 
>> Op zo 13 jun. 2021 07:28 schreef Julian Hyde <[email protected]>:
>> 
>>> A couple more:
>>> 
>>> 5. Make the release manager responsible for adding headers to files that
>>> are missing them.
>>> 6. Use a tool such as autostyle [1] that detects problems and fixes them.
>>> 
>>> Julian
>>> 
>>> [1] https://github.com/autostyle/autostyle <
>>> https://github.com/autostyle/autostyle>
>>> 
>>> 
>>>> On Jun 12, 2021, at 7:34 PM, Hans Van Akelyen <
>>> [email protected]> wrote:
>>>> 
>>>> The annoying part is that it needs to have the header when added to the
>>>> repository but outside of the repository it doesn't.
>>>> We currently have around 300 pipelines and 100 workflows in the
>>> repository
>>>> and we are advocating how easy it is to contribute these things to hop.
>>> Now
>>>> we would have to say, well it is easy but you need to add a header...
>> and
>>>> guess what... every time you change something you will need to add it
>>> again
>>>> because using the save button will overwrite your content.
>>>> 
>>>> There are a couple of ways to solve this:
>>>> 1) automate it with a github actions/Jenkins
>>>> 2) manually add the header
>>>> 3) add a toggle to the gui/code that needs to be activated when you are
>>>> creating pipelines for the repository
>>>> 4) move to a binary format
>>>> 
>>>> 1. Is not allowed/possible afaik, Jenkins definitely does not have
>> write
>>>> access to the code base, Github might have permission to write to a pr.
>>> But
>>>> then the question arises if it is even allowed to add a header to a
>> file
>>>> without the user confirming this.
>>>> 
>>>> 2. This adds another boundary for non-developers/regular users to
>>>> contribute samples and integration tests, they don't care about the
>>> content
>>>> of a hpl/hwf in their eyes this is a "binary-file" that needs no
>> editing,
>>>> and surely not every time you change a minor thing.
>>>> We are really trying hard to convince people to contribute small things
>>>> like a single sample or a single test, but noticed that even the usage
>> of
>>>> github and how to create a PR can be a "hard" process that requires
>>>> hand-holding for our user base that consists mainly of non-developers.
>>> This
>>>> would raise the bar a bit higher making it harder for those willing to
>>> jump.
>>>> 
>>>> 3. This might work for the core developers/contributors but will
>> probably
>>>> be forgotten by the friendly user that wants to contribute once,
>> meaning
>>> we
>>>> would have to point to them to add the header or do it ourselves.
>>>> 
>>>> 4. No need for headers here we could even keep the current xml
>> structure
>>>> but zip the content of the hpl/hwf
>>>> 
>>>> So to summarize, in the short term getting a release out shouldn't be
>>> hard.
>>>> One of us can add the header to all the files and be done with it. But
>> in
>>>> the long run this process is not sustainable.
>>>> 
>>>> Cheers,
>>>> Hans
>>>> 
>>>> On Sun, 13 Jun 2021 at 00:52, Julian Hyde <[email protected]>
>>> wrote:
>>>> 
>>>>> I still don’t see why the discussion about Apache release policy needs
>>> to
>>>>> be connected with discussion about file formats. It’s simpler to
>> resolve
>>>>> the issue about release policy first, make the release, and come back
>>> and
>>>>> discuss file format later.
>>>>> 
>>>>> Regarding release policy. When a user contributes a test case to Hop,
>>> that
>>>>> is a creative work according to copyright law. Like any contribution,
>> we
>>>>> don’t “claim copyright”; they retain copyright, but contribute under
>>> Apache
>>>>> license. And we require that text files have a header.
>>>>> 
>>>>> No one is proposing adding headers to pipeline and workflow files that
>>> are
>>>>> not contributed to Hop.
>>>>> 
>>>>> I find it hard to believe that adding a header to a test case will
>> make
>>> it
>>>>> behave differently, in the vast majority of cases. Exceptions can be
>>> made
>>>>> for the few case where it matters.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jun 12, 2021, at 3:25 PM, Matt Casters <[email protected]
>>> .INVALID>
>>>>> wrote:
>>>>>> 
>>>>>> That's really my point: it's really not as straightforward at all
>> like
>>>>> you
>>>>>> claimed Julian.  The files are produced by the Hop GUI and that's
>> what
>>> we
>>>>>> want.  We want to test what is actually used by our end-users, not
>> some
>>>>>> theoretical use-case which is typically handled by
>>>>> JUnit/Mockito/Powermock
>>>>>> and their ilk.  It's this old-school vision that an XML file has to
>> be
>>>>>> written by hand or something like that which messes up this debate.
>>>>>> The .hpl/.hwf file format does not and should not include the ASF
>>> header
>>>>>> either.  For our users it would be inappropriate as we can't claim
>>>>>> copyright on works produced by others.  In other words, when some
>>> person
>>>>> or
>>>>>> company uses our software and creates a pipeline, we can't just claim
>>>>>> copyright for that file.  At least that's how I see things.
>>>>>> 
>>>>>> As for YAML: my dislike for it is enormous but since it wouldn't
>> solve
>>>>> the
>>>>>> header issue I wouldn't pick it for that reason alone since it allows
>>>>>> comments.  Perhaps we should serialize in some binary format to get
>>> past
>>>>>> this issue.  Since we'll need to continue XML serialization anyway
>> it's
>>>>>> just a question of storing the integration tests and samples in a way
>>>>> that
>>>>>> can be approved by the ASF.
>>>>>> 
>>>>>> 
>>>>>> On Sat, Jun 12, 2021 at 10:54 PM Julian Hyde <[email protected]
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> I don’t think the discussion about headers really forces this issue.
>>>>> It’s
>>>>>>> a technical decision and shouldn’t be rushed.
>>>>>>> 
>>>>>>> Regarding the headers. It is straightforward to add headers to
>>> existing
>>>>>>> files. It is also straightforward to use a tool such as checkstyle
>> to
>>>>>>> enforce them (so, any PR that adds a .hpl file without a header will
>>>>> get a
>>>>>>> build error, which the contributor will duly fix).
>>>>>>> 
>>>>>>> In my opinion, Hop should allow multiple formats. XML is rather old,
>>> and
>>>>>>> people find it difficult to read without practice. JSON is a bit
>> more
>>>>>>> modern, but has terrible support for multi-line strings and (in its
>>>>>>> official form) doesn’t allow comments and is strict about quoting of
>>>>>>> identifiers. YAML (or similar) is worth considering; its model is
>>>>>>> compatible with JSON, it allows comments, it has much better support
>>> for
>>>>>>> multi-line strings, and it tends to diff/merge easier than XML and
>>> JSON.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jun 12, 2021, at 1:38 PM, Matt Casters <[email protected]
>>>>> .INVALID>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Folks,
>>>>>>>> 
>>>>>>>> It's been up in the air for quite some time now but it looks like
>>> we're
>>>>>>>> being forced by certain discussions in the release voting of
>>> 0.99-rc1.
>>>>>>> How
>>>>>>>> would you feel about moving to JSON for the standard file format of
>>>>>>>> pipelines and workflows?
>>>>>>>> I propose .hpj and .hwj as extensions.
>>>>>>>> This would push back our releases for a month or so while we
>> convert
>>>>> the
>>>>>>>> remaining serialization code to the new @HopMetadataProperty API
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Matt
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Neo4j Chief Solutions Architect
>>>>>> *✉   *[email protected]
>>>>>> ☎  +32486972937
>>>>> 
>>>>> 
>>> 
>>> 
>> 

Reply via email to