It would complicate things to have an exclusion for hpl/hwf files in our RAT check and then having to do another check when creating a release. I think the safest way would be to catch these things on the pull request, as we do run a RAT check on every PR using a github action.
The option to have an environment variable or toggle in the GUI for developers and contributors to enable is a valid solution. There will be cases where one-time contributors forget to enable it, but then it is up to the committers to point them to the solution or to add an extra commit to include the headers. This would then be as simple as validating the workflow/pipeline and resaving the content. Cheers, Hans On Sun, 13 Jun 2021 at 21:26, Julian Hyde <[email protected]> wrote: > Brandon, > > I don’t know the exact policy. It’s certainly preferable that contributors > include headers in their contribution. But, Matt points out that that > raises the bar to contributions, so I think it's OK to defer them until > release time. > > (IANAL, but here’s my rationale: A contribution has the same legal status > with and without headers. Code sitting in GitHub, even in the main branch, > is not ‘published’ in a legal sense until the release happens. But when the > code is released, the headers need to be present remind people of their > obligations under the license.) > > Julian > > > > On Jun 13, 2021, at 5:19 AM, Brandon Jackson <[email protected]> > wrote: > > > > When is it important for these legal headers to be present? > > Example: > > Does every commit have to have them? > > Is this a clean-up before release exercise where right before a release a > > script could modify every contribution and add the required headers to > the > > textual files? > > > > Understanding that small point could yield a small utility on the client > > side, like "asf-bless.sh" which could parse the tree of files and add > > headers where any are missing before contributors commit and generate a > > pull request. > > > > Brandon > > > > > > > > On Sun, Jun 13, 2021 at 4:12 AM Matt Casters <[email protected] > .invalid> > > wrote: > > > >> You are both right of-course. I'm not sure why I didn't see it before > but > >> we can simply add an option to add a header of choice defined in a file > >> somewhere to whichever file format we generate now (XML) or in the > future. > >> Sure, it would not be visible to users of the GUI but it would solve the > >> conundrum. > >> > >> Op zo 13 jun. 2021 07:28 schreef Julian Hyde <[email protected]>: > >> > >>> A couple more: > >>> > >>> 5. Make the release manager responsible for adding headers to files > that > >>> are missing them. > >>> 6. Use a tool such as autostyle [1] that detects problems and fixes > them. > >>> > >>> Julian > >>> > >>> [1] https://github.com/autostyle/autostyle < > >>> https://github.com/autostyle/autostyle> > >>> > >>> > >>>> On Jun 12, 2021, at 7:34 PM, Hans Van Akelyen < > >>> [email protected]> wrote: > >>>> > >>>> The annoying part is that it needs to have the header when added to > the > >>>> repository but outside of the repository it doesn't. > >>>> We currently have around 300 pipelines and 100 workflows in the > >>> repository > >>>> and we are advocating how easy it is to contribute these things to > hop. > >>> Now > >>>> we would have to say, well it is easy but you need to add a header... > >> and > >>>> guess what... every time you change something you will need to add it > >>> again > >>>> because using the save button will overwrite your content. > >>>> > >>>> There are a couple of ways to solve this: > >>>> 1) automate it with a github actions/Jenkins > >>>> 2) manually add the header > >>>> 3) add a toggle to the gui/code that needs to be activated when you > are > >>>> creating pipelines for the repository > >>>> 4) move to a binary format > >>>> > >>>> 1. Is not allowed/possible afaik, Jenkins definitely does not have > >> write > >>>> access to the code base, Github might have permission to write to a > pr. > >>> But > >>>> then the question arises if it is even allowed to add a header to a > >> file > >>>> without the user confirming this. > >>>> > >>>> 2. This adds another boundary for non-developers/regular users to > >>>> contribute samples and integration tests, they don't care about the > >>> content > >>>> of a hpl/hwf in their eyes this is a "binary-file" that needs no > >> editing, > >>>> and surely not every time you change a minor thing. > >>>> We are really trying hard to convince people to contribute small > things > >>>> like a single sample or a single test, but noticed that even the usage > >> of > >>>> github and how to create a PR can be a "hard" process that requires > >>>> hand-holding for our user base that consists mainly of non-developers. > >>> This > >>>> would raise the bar a bit higher making it harder for those willing to > >>> jump. > >>>> > >>>> 3. This might work for the core developers/contributors but will > >> probably > >>>> be forgotten by the friendly user that wants to contribute once, > >> meaning > >>> we > >>>> would have to point to them to add the header or do it ourselves. > >>>> > >>>> 4. No need for headers here we could even keep the current xml > >> structure > >>>> but zip the content of the hpl/hwf > >>>> > >>>> So to summarize, in the short term getting a release out shouldn't be > >>> hard. > >>>> One of us can add the header to all the files and be done with it. But > >> in > >>>> the long run this process is not sustainable. > >>>> > >>>> Cheers, > >>>> Hans > >>>> > >>>> On Sun, 13 Jun 2021 at 00:52, Julian Hyde <[email protected]> > >>> wrote: > >>>> > >>>>> I still don’t see why the discussion about Apache release policy > needs > >>> to > >>>>> be connected with discussion about file formats. It’s simpler to > >> resolve > >>>>> the issue about release policy first, make the release, and come back > >>> and > >>>>> discuss file format later. > >>>>> > >>>>> Regarding release policy. When a user contributes a test case to Hop, > >>> that > >>>>> is a creative work according to copyright law. Like any contribution, > >> we > >>>>> don’t “claim copyright”; they retain copyright, but contribute under > >>> Apache > >>>>> license. And we require that text files have a header. > >>>>> > >>>>> No one is proposing adding headers to pipeline and workflow files > that > >>> are > >>>>> not contributed to Hop. > >>>>> > >>>>> I find it hard to believe that adding a header to a test case will > >> make > >>> it > >>>>> behave differently, in the vast majority of cases. Exceptions can be > >>> made > >>>>> for the few case where it matters. > >>>>> > >>>>> Julian > >>>>> > >>>>> > >>>>> > >>>>>> On Jun 12, 2021, at 3:25 PM, Matt Casters <[email protected] > >>> .INVALID> > >>>>> wrote: > >>>>>> > >>>>>> That's really my point: it's really not as straightforward at all > >> like > >>>>> you > >>>>>> claimed Julian. The files are produced by the Hop GUI and that's > >> what > >>> we > >>>>>> want. We want to test what is actually used by our end-users, not > >> some > >>>>>> theoretical use-case which is typically handled by > >>>>> JUnit/Mockito/Powermock > >>>>>> and their ilk. It's this old-school vision that an XML file has to > >> be > >>>>>> written by hand or something like that which messes up this debate. > >>>>>> The .hpl/.hwf file format does not and should not include the ASF > >>> header > >>>>>> either. For our users it would be inappropriate as we can't claim > >>>>>> copyright on works produced by others. In other words, when some > >>> person > >>>>> or > >>>>>> company uses our software and creates a pipeline, we can't just > claim > >>>>>> copyright for that file. At least that's how I see things. > >>>>>> > >>>>>> As for YAML: my dislike for it is enormous but since it wouldn't > >> solve > >>>>> the > >>>>>> header issue I wouldn't pick it for that reason alone since it > allows > >>>>>> comments. Perhaps we should serialize in some binary format to get > >>> past > >>>>>> this issue. Since we'll need to continue XML serialization anyway > >> it's > >>>>>> just a question of storing the integration tests and samples in a > way > >>>>> that > >>>>>> can be approved by the ASF. > >>>>>> > >>>>>> > >>>>>> On Sat, Jun 12, 2021 at 10:54 PM Julian Hyde < > [email protected] > >>> > >>>>> wrote: > >>>>>> > >>>>>>> I don’t think the discussion about headers really forces this > issue. > >>>>> It’s > >>>>>>> a technical decision and shouldn’t be rushed. > >>>>>>> > >>>>>>> Regarding the headers. It is straightforward to add headers to > >>> existing > >>>>>>> files. It is also straightforward to use a tool such as checkstyle > >> to > >>>>>>> enforce them (so, any PR that adds a .hpl file without a header > will > >>>>> get a > >>>>>>> build error, which the contributor will duly fix). > >>>>>>> > >>>>>>> In my opinion, Hop should allow multiple formats. XML is rather > old, > >>> and > >>>>>>> people find it difficult to read without practice. JSON is a bit > >> more > >>>>>>> modern, but has terrible support for multi-line strings and (in its > >>>>>>> official form) doesn’t allow comments and is strict about quoting > of > >>>>>>> identifiers. YAML (or similar) is worth considering; its model is > >>>>>>> compatible with JSON, it allows comments, it has much better > support > >>> for > >>>>>>> multi-line strings, and it tends to diff/merge easier than XML and > >>> JSON. > >>>>>>> > >>>>>>> Julian > >>>>>>> > >>>>>>> > >>>>>>>> On Jun 12, 2021, at 1:38 PM, Matt Casters <[email protected] > >>>>> .INVALID> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Folks, > >>>>>>>> > >>>>>>>> It's been up in the air for quite some time now but it looks like > >>> we're > >>>>>>>> being forced by certain discussions in the release voting of > >>> 0.99-rc1. > >>>>>>> How > >>>>>>>> would you feel about moving to JSON for the standard file format > of > >>>>>>>> pipelines and workflows? > >>>>>>>> I propose .hpj and .hwj as extensions. > >>>>>>>> This would push back our releases for a month or so while we > >> convert > >>>>> the > >>>>>>>> remaining serialization code to the new @HopMetadataProperty API > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Matt > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> Neo4j Chief Solutions Architect > >>>>>> *✉ *[email protected] > >>>>>> ☎ +32486972937 > >>>>> > >>>>> > >>> > >>> > >> > >
