Re: Controlling the images used for the builds/releases

2020-12-16 Thread Jarek Potiuk
Hello, 

I finally had some time after Airlfow 2.0 release and I opened discussion about 
the policy in legal-disc...@apache.org:
 
https://lists.apache.org/thread.html/r7c9ceb3d6c764119b14dfedb0e22957993d93cf529792c402aaa05fc%40%3Clegal-discuss.apache.org%3E

I propose we continue the discussion there

J.

On 2020/09/14 18:00:09, Dave Fisher  wrote: 
> Hi Jarek,
> 
> I’ve yet to read your Cwiki, but I am on the OpenOffice PMC.
> 
> (1) If you wish to discuss our build processes for Centos, WIndows, and macOS 
> please email d...@openoffice.apache.org. We are working towards our 4.1.8 
> release for the 20th Anniversary of Openoffice.org.
> 
> (2) If you wish to understand the many artifacts produced:
> 
> Source - https://dist.apache.org/repos/dist/release/openoffice/4.1.7/source/
> SDK - 
> https://dist.apache.org/repos/dist/release/openoffice/4.1.7/binaries/SDK/
> User installation and language packs - 
> https://dist.apache.org/repos/dist/release/openoffice/4.1.7/binaries/
> 
> There are currently 41 different languages in 4 linux flavors, 1 windows and 
> 1 macOS.
> 
> Total installation and language binaries are 41*2*(1+1+4) = 492 binaries x 4 
> = 1968 files.
> 
> Note for macOS, we create dmg files, and for Windows Installer exe 
> executables.
> 
> (3) Due to the huge size of all of our binaries OpenOffice is NOT distributed 
> through the Apache Mirrors. Instead we are allowed to distribute through 
> SourceForge.net
> 
> Regards,
> Dave
> 
> > On Sep 14, 2020, at 10:14 AM, Jarek Potiuk  wrote:
> > 
> > Joan,
> > 
> > I read your comment and I have a kind request - hopefully you are not yet
> > out - you mentioned in the comment Open Office and artifacts that would not
> > fall into the criteria proposed. Could you please point us to one or two
> > examples of such artifacts and someone that could carry the discussion -
> > while you are away? I think I would like to understand what the problem is
> > but it might be difficult to answer your doubts without having some
> > specific examples that we can base our discussion on and someone who is at
> > least a bit familiar with the matter.
> > 
> > J.
> > 
> > 
> > On Mon, Sep 14, 2020 at 6:30 PM Jarek Potiuk 
> > wrote:
> > 
> >> Very true Matt.
> >> 
> >> I think this is really a crucial part of the proposal to define the
> >> boundary between the Apache / Non-Apache artifacts (potentially with a
> >> different, non-ASF compliant license).
> >> 
> >> The "compiled" vs.  "packaged" that I proposed is one way of looking
> >> at it, rather simple and straightforward to understand, verify, and
> >> reason about. But I would love to hear other ideas - maybe some other
> >> communities and OSS organizations approached it already and they came
> >> up with some other ways of classifying it ?
> >> 
> >> One thing that is quite important here - we are not really talking
> >> about "releases" and we should continue avoiding the name. I have no
> >> doubt that proper release is .tar.gz signed and checksummed on
> >> Apache's SVN containing sources and instructions on how to build the
> >> software (including the convenience packages) using platforms and
> >> tools available. There are no other "releases" by ASF, and I think
> >> there should not be.
> >> 
> >> I keep on reminding it to myself when I proposed the changes, that
> >> "convenience packages" are not "official" ASF software releases so I
> >> think the policies there - however legal and "correct" do not have to
> >> be that strict.
> >> 
> >> I am not a lawyer to grasp all the implications - so I am really
> >> looking at the "crowd wisdom here" to understand all the consequences.
> >> I think we will never get a 100% correct and "compilable" policy (so
> >> to speak). My wife is a lawyer by education, so I know very well from
> >> her that "law does not compile" (which was a bit surprising to an
> >> engineer like me initially).
> >> 
> >> I think eventually - we will have to make some interpretations and
> >> assumptions, and eventually, the ASF might have to take some risks
> >> when reviewing and accepting such a proposal. But the risk-taking
> >> should be very well informed in this case so I think we should gather
> >> a lot of inputs and opinions on that.
> >> 
> >> J
> >> 
> >> 
> >> On Mon, Sep 14, 2020 at 6:08 PM Matt Sicker  wrote:
> >>> 
> >>> From a distribution standpoint, the point of these policies to me has
> >>> been to emphasize that anything we distribute here at Apache can be
> >>> safely used and copied under the terms of the Apache License. As such,
> >>> source releases have always been the target, though over time, Apache
> >>> has accumulated several end-user type projects that may or may not
> >>> have a developer audience that knows what to do with source code. The
> >>> binary distributions become a useful channel for projects so that
> >>> users can actually use the project without technical knowledge of
> >>> development environment setups and such. This raises a con

Re: Controlling the images used for the builds/releases

2020-09-15 Thread Jarek Potiuk
Just cross-posting the discussion at
https://lists.apache.org/thread.html/r8ff55d638c2efa1251636556881ef2e8a6305d19fddf184fcea96099%40%3Cdev.community.apache.org%3E


Would it be possible that move any discussion there? I have a feeling
dev@communty is a better place to discuss the subject and I think the
discussions are very closely related.

Here is the iimportant part of my message from that cross-posted message:

just run through a few of the comments we have there and just for
the sake of keeping people informed on what has changed so far here are
some "gists" of my changes comparing to the first draft:

* there is an open question about the viability of putting all the
instructions or scripts to build the binary dependencies into  released
sources. Giving the example of OpenOffice, CouchDB and Erlang which makes
it next to impossible to do. So I proposed to explicitly say that any form
of the instructions: scripts, manual instruction or POINTERS to the right
instructions is fine. Simple HTTP link where you can find how to build an
external OSS library should be perfectly fine. My ultimate goal is that
whatever whenever the source .tar.gz package is created - the goal is that
the user can get the sources and following the instructions (including the
links to instructions) can - potentially rebuild the software legally. It
might be a complex and recursive process (build a library to build a
library) and at times those instructions might not work (as it is with all
the instructions) but at least an attempt should be made to make it
possible.

* The "official" 3rd-party binary package is not a good name - I replaced
it for now with the "maintained OSS" binary package. The idea behind it is
that shortly - it should be open-source and it should be maintained. So
while the name does not reflect all the subtleties of "maintained" and
"OSS" but it reflects the spirit. I tried to make the "recursive"
definition as much relaxed as possible (in terms of SHOULD vs. MUST except
with the Licencing which is a MUST)

* In pretty much all cases where I write about "best practices", they are
not absolute requirements - so whenever possible they are SHOULD instead of
MUST. I am very far from imposing all the best practices on all ASF
projects - that will be impractical and stupid thing to do. I really treat
those "best practices" as "beacons" - targets that we can have in mind but
might never fully achieve them. And as long as we have good reason, not to
follow those practices - by all means we do not have to. But if easy and
possible, I see the best practices as a powerful message that improves the
"Brand" of ASF in general from the user perspective. There are no "bonus
points" for projects that follow it vs. those which decided not to in
particular cases. But having those as "targets" for ASF projects is an
important message.

J.


On Mon, Sep 14, 2020 at 9:10 PM Jarek Potiuk 
wrote:

> Yep. I have maybe not intimate knowledge of all the licensing details but
> I am really interested in licenses in general and I am rather familiar with
> the doc (I put it also as reference in my proposal). I literally wanted the
> proposal to use everything that is already there and come up with an
> absolute minimum set of changes/
> I think I am into getting feedback and comments. And the proposal is - I
> think - still far from what we might eventually end up with.
>
> To the point of Class B and Helm Charts - I tried to approach it in a more
> general way than just Helm Charts but to figure out if we can actually come
> up with some policy that will cover the wider set of "packaging" mechanisms
> than just Helm Chart.
> For example, the Container images (which actually are dependencies of the
> Helm Charts) are the ones that are much more "problematic".
>
> As I explained in the "context" of the proposal
> https://cwiki.apache.org/confluence/display/COMDEV/Updates+of+policies+for+the+convenience+packages
> pretty much any container contains GPL code. And there is no easy solution
> with limiting those to Class B. Not with Python, not with Java, not with
> most of the other language containers, no matter how hard we all try.
>
> Also - if I understand correctly the worries of OOffice project. I think
> the current proposal addresses pretty well the "size" of the packages. In
> the proposal - I do not even propose to publish those all sources etc. It's
> just that we give the users clear instructions on how they could (if they
> are determined enough) build them all (from the legal and technical point
> of view). And it could be "recursive" - as long as we know that
> the dependency we point to, can be build and has instructions - we can
> simply point to those instructions when we release the sources.
>
> I know that there are many dependencies, but trying to understand what is
> the whole OOffice build process is truly a Herculean effort. But I would
> love to get to the bottom of the issue raised by Joan and try to adapt it.
>
> 

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Jarek Potiuk
Yep. I have maybe not intimate knowledge of all the licensing details but I
am really interested in licenses in general and I am rather familiar with
the doc (I put it also as reference in my proposal). I literally wanted the
proposal to use everything that is already there and come up with an
absolute minimum set of changes/
I think I am into getting feedback and comments. And the proposal is - I
think - still far from what we might eventually end up with.

To the point of Class B and Helm Charts - I tried to approach it in a more
general way than just Helm Charts but to figure out if we can actually come
up with some policy that will cover the wider set of "packaging" mechanisms
than just Helm Chart.
For example, the Container images (which actually are dependencies of the
Helm Charts) are the ones that are much more "problematic".

As I explained in the "context" of the proposal
https://cwiki.apache.org/confluence/display/COMDEV/Updates+of+policies+for+the+convenience+packages
pretty much any container contains GPL code. And there is no easy solution
with limiting those to Class B. Not with Python, not with Java, not with
most of the other language containers, no matter how hard we all try.

Also - if I understand correctly the worries of OOffice project. I think
the current proposal addresses pretty well the "size" of the packages. In
the proposal - I do not even propose to publish those all sources etc. It's
just that we give the users clear instructions on how they could (if they
are determined enough) build them all (from the legal and technical point
of view). And it could be "recursive" - as long as we know that
the dependency we point to, can be build and has instructions - we can
simply point to those instructions when we release the sources.

I know that there are many dependencies, but trying to understand what is
the whole OOffice build process is truly a Herculean effort. But I would
love to get to the bottom of the issue raised by Joan and try to adapt it.

So I'd really appreciate some help here. Would it be possible that you
point me to one or few particular dependencies/artifacts that you think
might have a problem with those assumptions:

- we can point the user to the sources of those artifacts
- we can tell them how they can build them
- they can do it legally and technically on their own using available tools
and platforms

I understand there might be some limitations like signing the .dmg images
for MacOS for example (Where you technically need an Apple-approved
certificate to distribute). But I think we are talking about power users,
who can disable any distribution limitations and work in "developer" mode.
If those needs some clarifications, we can add them.

J.


On Mon, Sep 14, 2020 at 8:23 PM Dave Fisher  wrote:

> Hi Jarek,
>
> I’m sure that you have reviewed https://www.apache.org/legal/resolved.html
>
> I think that you might want to focus on Class B licenses in these
> discussions.
>
> It might help you to keep in a more limited scope and determine how to
> make compliant Helm Charts.
>
> The legal committee and VP are the ones making decisions about what is
> compliant.
>
> Regards,
> Dave
>
> > On Sep 14, 2020, at 9:30 AM, Jarek Potiuk 
> wrote:
> >
> > Very true Matt.
> >
> > I think this is really a crucial part of the proposal to define the
> > boundary between the Apache / Non-Apache artifacts (potentially with a
> > different, non-ASF compliant license).
> >
> > The "compiled" vs.  "packaged" that I proposed is one way of looking
> > at it, rather simple and straightforward to understand, verify, and
> > reason about. But I would love to hear other ideas - maybe some other
> > communities and OSS organizations approached it already and they came
> > up with some other ways of classifying it ?
> >
> > One thing that is quite important here - we are not really talking
> > about "releases" and we should continue avoiding the name. I have no
> > doubt that proper release is .tar.gz signed and checksummed on
> > Apache's SVN containing sources and instructions on how to build the
> > software (including the convenience packages) using platforms and
> > tools available. There are no other "releases" by ASF, and I think
> > there should not be.
> >
> > I keep on reminding it to myself when I proposed the changes, that
> > "convenience packages" are not "official" ASF software releases so I
> > think the policies there - however legal and "correct" do not have to
> > be that strict.
> >
> > I am not a lawyer to grasp all the implications - so I am really
> > looking at the "crowd wisdom here" to understand all the consequences.
> > I think we will never get a 100% correct and "compilable" policy (so
> > to speak). My wife is a lawyer by education, so I know very well from
> > her that "law does not compile" (which was a bit surprising to an
> > engineer like me initially).
> >
> > I think eventually - we will have to make some interpretations and
> > assumptions, and eventua

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Dave Fisher
Hi Jarek,

I’m sure that you have reviewed https://www.apache.org/legal/resolved.html

I think that you might want to focus on Class B licenses in these discussions.

It might help you to keep in a more limited scope and determine how to make 
compliant Helm Charts.

The legal committee and VP are the ones making decisions about what is 
compliant.

Regards,
Dave

> On Sep 14, 2020, at 9:30 AM, Jarek Potiuk  wrote:
> 
> Very true Matt.
> 
> I think this is really a crucial part of the proposal to define the
> boundary between the Apache / Non-Apache artifacts (potentially with a
> different, non-ASF compliant license).
> 
> The "compiled" vs.  "packaged" that I proposed is one way of looking
> at it, rather simple and straightforward to understand, verify, and
> reason about. But I would love to hear other ideas - maybe some other
> communities and OSS organizations approached it already and they came
> up with some other ways of classifying it ?
> 
> One thing that is quite important here - we are not really talking
> about "releases" and we should continue avoiding the name. I have no
> doubt that proper release is .tar.gz signed and checksummed on
> Apache's SVN containing sources and instructions on how to build the
> software (including the convenience packages) using platforms and
> tools available. There are no other "releases" by ASF, and I think
> there should not be.
> 
> I keep on reminding it to myself when I proposed the changes, that
> "convenience packages" are not "official" ASF software releases so I
> think the policies there - however legal and "correct" do not have to
> be that strict.
> 
> I am not a lawyer to grasp all the implications - so I am really
> looking at the "crowd wisdom here" to understand all the consequences.
> I think we will never get a 100% correct and "compilable" policy (so
> to speak). My wife is a lawyer by education, so I know very well from
> her that "law does not compile" (which was a bit surprising to an
> engineer like me initially).
> 
> I think eventually - we will have to make some interpretations and
> assumptions, and eventually, the ASF might have to take some risks
> when reviewing and accepting such a proposal. But the risk-taking
> should be very well informed in this case so I think we should gather
> a lot of inputs and opinions on that.
> 
> J
> 
> 
> On Mon, Sep 14, 2020 at 6:08 PM Matt Sicker  wrote:
>> 
>> From a distribution standpoint, the point of these policies to me has
>> been to emphasize that anything we distribute here at Apache can be
>> safely used and copied under the terms of the Apache License. As such,
>> source releases have always been the target, though over time, Apache
>> has accumulated several end-user type projects that may or may not
>> have a developer audience that knows what to do with source code. The
>> binary distributions become a useful channel for projects so that
>> users can actually use the project without technical knowledge of
>> development environment setups and such. This raises a conundrum,
>> though, that nearly any non-trivial binary software artifact will
>> contain or link to code that is not distributed under the Apache
>> License, but it may be compatible (e.g., GPLv3 is compatible with
>> ALv2, but combining the two results in GPLv3 basically, not
>> ALv2+GPLv3; this doesn't change existing licenses of course). For our
>> end users downloading Apache artifacts, we've had a history of
>> publishing IP-safe source code that is easily used under the ALv2. I
>> think the historical problem behind why binary artifacts haven't been
>> raised to the same status involves clarifying the line between where
>> our artifacts end and a third party's begin. This is especially
>> apparent in languages where the reference implementation runtime is
>> GPL (e.g., OpenJDK, though that itself has an interesting history due
>> to Apache Harmony having been a thing at one point).
>> 
>> From a security standpoint, distributing binaries requires more
>> infrastructural security to respond to potential malware infections,
>> CVEs in dependencies, etc.
>> 
>> 
>> On Mon, 14 Sep 2020 at 10:54, Jarek Potiuk  wrote:
>>> 
>>> Oh yeah. I start realizing now how herculean it is :). No worries, I am
>>> afraid when you are back, the discussion will be just warming up :).
>>> 
>>> Speaking of the "double standard" - the main reason really comes from
>>> licensing. When you compile something in that is GPL, your code starts to
>>> be bound by the licence. But when you just bundle it together in a software
>>> package - you are not.
>>> 
>>> So this is pretty much unavoidable to apply different rules to those
>>> situations. No matter what - we have to make this distinction IMHO. But
>>> let's see what others say on that.  I'd love to hear your thought on that,
>>> before you head out.
>>> 
>>> J
>>> 
>>> 
>>> On Mon, Sep 14, 2020 at 5:47 PM Joan Touzet  wrote:
>>> 
 Hi Jarek,
 
 I'm about to head out for 3 weeks, so

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Dave Fisher
Hi Jarek,

I’ve yet to read your Cwiki, but I am on the OpenOffice PMC.

(1) If you wish to discuss our build processes for Centos, WIndows, and macOS 
please email d...@openoffice.apache.org. We are working towards our 4.1.8 
release for the 20th Anniversary of Openoffice.org.

(2) If you wish to understand the many artifacts produced:

Source - https://dist.apache.org/repos/dist/release/openoffice/4.1.7/source/
SDK - https://dist.apache.org/repos/dist/release/openoffice/4.1.7/binaries/SDK/
User installation and language packs - 
https://dist.apache.org/repos/dist/release/openoffice/4.1.7/binaries/

There are currently 41 different languages in 4 linux flavors, 1 windows and 1 
macOS.

Total installation and language binaries are 41*2*(1+1+4) = 492 binaries x 4 = 
1968 files.

Note for macOS, we create dmg files, and for Windows Installer exe executables.

(3) Due to the huge size of all of our binaries OpenOffice is NOT distributed 
through the Apache Mirrors. Instead we are allowed to distribute through 
SourceForge.net

Regards,
Dave

> On Sep 14, 2020, at 10:14 AM, Jarek Potiuk  wrote:
> 
> Joan,
> 
> I read your comment and I have a kind request - hopefully you are not yet
> out - you mentioned in the comment Open Office and artifacts that would not
> fall into the criteria proposed. Could you please point us to one or two
> examples of such artifacts and someone that could carry the discussion -
> while you are away? I think I would like to understand what the problem is
> but it might be difficult to answer your doubts without having some
> specific examples that we can base our discussion on and someone who is at
> least a bit familiar with the matter.
> 
> J.
> 
> 
> On Mon, Sep 14, 2020 at 6:30 PM Jarek Potiuk 
> wrote:
> 
>> Very true Matt.
>> 
>> I think this is really a crucial part of the proposal to define the
>> boundary between the Apache / Non-Apache artifacts (potentially with a
>> different, non-ASF compliant license).
>> 
>> The "compiled" vs.  "packaged" that I proposed is one way of looking
>> at it, rather simple and straightforward to understand, verify, and
>> reason about. But I would love to hear other ideas - maybe some other
>> communities and OSS organizations approached it already and they came
>> up with some other ways of classifying it ?
>> 
>> One thing that is quite important here - we are not really talking
>> about "releases" and we should continue avoiding the name. I have no
>> doubt that proper release is .tar.gz signed and checksummed on
>> Apache's SVN containing sources and instructions on how to build the
>> software (including the convenience packages) using platforms and
>> tools available. There are no other "releases" by ASF, and I think
>> there should not be.
>> 
>> I keep on reminding it to myself when I proposed the changes, that
>> "convenience packages" are not "official" ASF software releases so I
>> think the policies there - however legal and "correct" do not have to
>> be that strict.
>> 
>> I am not a lawyer to grasp all the implications - so I am really
>> looking at the "crowd wisdom here" to understand all the consequences.
>> I think we will never get a 100% correct and "compilable" policy (so
>> to speak). My wife is a lawyer by education, so I know very well from
>> her that "law does not compile" (which was a bit surprising to an
>> engineer like me initially).
>> 
>> I think eventually - we will have to make some interpretations and
>> assumptions, and eventually, the ASF might have to take some risks
>> when reviewing and accepting such a proposal. But the risk-taking
>> should be very well informed in this case so I think we should gather
>> a lot of inputs and opinions on that.
>> 
>> J
>> 
>> 
>> On Mon, Sep 14, 2020 at 6:08 PM Matt Sicker  wrote:
>>> 
>>> From a distribution standpoint, the point of these policies to me has
>>> been to emphasize that anything we distribute here at Apache can be
>>> safely used and copied under the terms of the Apache License. As such,
>>> source releases have always been the target, though over time, Apache
>>> has accumulated several end-user type projects that may or may not
>>> have a developer audience that knows what to do with source code. The
>>> binary distributions become a useful channel for projects so that
>>> users can actually use the project without technical knowledge of
>>> development environment setups and such. This raises a conundrum,
>>> though, that nearly any non-trivial binary software artifact will
>>> contain or link to code that is not distributed under the Apache
>>> License, but it may be compatible (e.g., GPLv3 is compatible with
>>> ALv2, but combining the two results in GPLv3 basically, not
>>> ALv2+GPLv3; this doesn't change existing licenses of course). For our
>>> end users downloading Apache artifacts, we've had a history of
>>> publishing IP-safe source code that is easily used under the ALv2. I
>>> think the historical problem behind why binary artifacts haven'

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Jarek Potiuk
Joan,

I read your comment and I have a kind request - hopefully you are not yet
out - you mentioned in the comment Open Office and artifacts that would not
fall into the criteria proposed. Could you please point us to one or two
examples of such artifacts and someone that could carry the discussion -
while you are away? I think I would like to understand what the problem is
but it might be difficult to answer your doubts without having some
specific examples that we can base our discussion on and someone who is at
least a bit familiar with the matter.

J.


On Mon, Sep 14, 2020 at 6:30 PM Jarek Potiuk 
wrote:

> Very true Matt.
>
> I think this is really a crucial part of the proposal to define the
> boundary between the Apache / Non-Apache artifacts (potentially with a
> different, non-ASF compliant license).
>
> The "compiled" vs.  "packaged" that I proposed is one way of looking
> at it, rather simple and straightforward to understand, verify, and
> reason about. But I would love to hear other ideas - maybe some other
> communities and OSS organizations approached it already and they came
> up with some other ways of classifying it ?
>
> One thing that is quite important here - we are not really talking
> about "releases" and we should continue avoiding the name. I have no
> doubt that proper release is .tar.gz signed and checksummed on
> Apache's SVN containing sources and instructions on how to build the
> software (including the convenience packages) using platforms and
> tools available. There are no other "releases" by ASF, and I think
> there should not be.
>
> I keep on reminding it to myself when I proposed the changes, that
> "convenience packages" are not "official" ASF software releases so I
> think the policies there - however legal and "correct" do not have to
> be that strict.
>
> I am not a lawyer to grasp all the implications - so I am really
> looking at the "crowd wisdom here" to understand all the consequences.
> I think we will never get a 100% correct and "compilable" policy (so
> to speak). My wife is a lawyer by education, so I know very well from
> her that "law does not compile" (which was a bit surprising to an
> engineer like me initially).
>
> I think eventually - we will have to make some interpretations and
> assumptions, and eventually, the ASF might have to take some risks
> when reviewing and accepting such a proposal. But the risk-taking
> should be very well informed in this case so I think we should gather
> a lot of inputs and opinions on that.
>
> J
>
>
> On Mon, Sep 14, 2020 at 6:08 PM Matt Sicker  wrote:
> >
> > From a distribution standpoint, the point of these policies to me has
> > been to emphasize that anything we distribute here at Apache can be
> > safely used and copied under the terms of the Apache License. As such,
> > source releases have always been the target, though over time, Apache
> > has accumulated several end-user type projects that may or may not
> > have a developer audience that knows what to do with source code. The
> > binary distributions become a useful channel for projects so that
> > users can actually use the project without technical knowledge of
> > development environment setups and such. This raises a conundrum,
> > though, that nearly any non-trivial binary software artifact will
> > contain or link to code that is not distributed under the Apache
> > License, but it may be compatible (e.g., GPLv3 is compatible with
> > ALv2, but combining the two results in GPLv3 basically, not
> > ALv2+GPLv3; this doesn't change existing licenses of course). For our
> > end users downloading Apache artifacts, we've had a history of
> > publishing IP-safe source code that is easily used under the ALv2. I
> > think the historical problem behind why binary artifacts haven't been
> > raised to the same status involves clarifying the line between where
> > our artifacts end and a third party's begin. This is especially
> > apparent in languages where the reference implementation runtime is
> > GPL (e.g., OpenJDK, though that itself has an interesting history due
> > to Apache Harmony having been a thing at one point).
> >
> > From a security standpoint, distributing binaries requires more
> > infrastructural security to respond to potential malware infections,
> > CVEs in dependencies, etc.
> >
> >
> > On Mon, 14 Sep 2020 at 10:54, Jarek Potiuk 
> wrote:
> > >
> > > Oh yeah. I start realizing now how herculean it is :). No worries, I am
> > > afraid when you are back, the discussion will be just warming up :).
> > >
> > > Speaking of the "double standard" - the main reason really comes from
> > > licensing. When you compile something in that is GPL, your code starts
> to
> > > be bound by the licence. But when you just bundle it together in a
> software
> > > package - you are not.
> > >
> > > So this is pretty much unavoidable to apply different rules to those
> > > situations. No matter what - we have to make this distinction IMHO. But
> > > let'

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Jarek Potiuk
Very true Matt.

I think this is really a crucial part of the proposal to define the
boundary between the Apache / Non-Apache artifacts (potentially with a
different, non-ASF compliant license).

The "compiled" vs.  "packaged" that I proposed is one way of looking
at it, rather simple and straightforward to understand, verify, and
reason about. But I would love to hear other ideas - maybe some other
communities and OSS organizations approached it already and they came
up with some other ways of classifying it ?

One thing that is quite important here - we are not really talking
about "releases" and we should continue avoiding the name. I have no
doubt that proper release is .tar.gz signed and checksummed on
Apache's SVN containing sources and instructions on how to build the
software (including the convenience packages) using platforms and
tools available. There are no other "releases" by ASF, and I think
there should not be.

I keep on reminding it to myself when I proposed the changes, that
"convenience packages" are not "official" ASF software releases so I
think the policies there - however legal and "correct" do not have to
be that strict.

I am not a lawyer to grasp all the implications - so I am really
looking at the "crowd wisdom here" to understand all the consequences.
I think we will never get a 100% correct and "compilable" policy (so
to speak). My wife is a lawyer by education, so I know very well from
her that "law does not compile" (which was a bit surprising to an
engineer like me initially).

I think eventually - we will have to make some interpretations and
assumptions, and eventually, the ASF might have to take some risks
when reviewing and accepting such a proposal. But the risk-taking
should be very well informed in this case so I think we should gather
a lot of inputs and opinions on that.

J


On Mon, Sep 14, 2020 at 6:08 PM Matt Sicker  wrote:
>
> From a distribution standpoint, the point of these policies to me has
> been to emphasize that anything we distribute here at Apache can be
> safely used and copied under the terms of the Apache License. As such,
> source releases have always been the target, though over time, Apache
> has accumulated several end-user type projects that may or may not
> have a developer audience that knows what to do with source code. The
> binary distributions become a useful channel for projects so that
> users can actually use the project without technical knowledge of
> development environment setups and such. This raises a conundrum,
> though, that nearly any non-trivial binary software artifact will
> contain or link to code that is not distributed under the Apache
> License, but it may be compatible (e.g., GPLv3 is compatible with
> ALv2, but combining the two results in GPLv3 basically, not
> ALv2+GPLv3; this doesn't change existing licenses of course). For our
> end users downloading Apache artifacts, we've had a history of
> publishing IP-safe source code that is easily used under the ALv2. I
> think the historical problem behind why binary artifacts haven't been
> raised to the same status involves clarifying the line between where
> our artifacts end and a third party's begin. This is especially
> apparent in languages where the reference implementation runtime is
> GPL (e.g., OpenJDK, though that itself has an interesting history due
> to Apache Harmony having been a thing at one point).
>
> From a security standpoint, distributing binaries requires more
> infrastructural security to respond to potential malware infections,
> CVEs in dependencies, etc.
>
>
> On Mon, 14 Sep 2020 at 10:54, Jarek Potiuk  wrote:
> >
> > Oh yeah. I start realizing now how herculean it is :). No worries, I am
> > afraid when you are back, the discussion will be just warming up :).
> >
> > Speaking of the "double standard" - the main reason really comes from
> > licensing. When you compile something in that is GPL, your code starts to
> > be bound by the licence. But when you just bundle it together in a software
> > package - you are not.
> >
> > So this is pretty much unavoidable to apply different rules to those
> > situations. No matter what - we have to make this distinction IMHO. But
> > let's see what others say on that.  I'd love to hear your thought on that,
> > before you head out.
> >
> > J
> >
> >
> > On Mon, Sep 14, 2020 at 5:47 PM Joan Touzet  wrote:
> >
> > > Hi Jarek,
> > >
> > > I'm about to head out for 3 weeks, so I'm going to miss most of this
> > > discussion. I've done my best to leave comments in your document, but
> > > just picking out one topic in this thread:
> > >
> > > On 14/09/2020 02:40, Jarek Potiuk wrote:
> > > > Yeah - I see the point and to be honest, that was exactly my original
> > > > intention when I wrote the proposal. I modified it slightly to reflect
> > > that
> > > > - I think now after preparing the proposal that the "gist" of it is
> > > really
> > > > to introduce two kinds of convenience packages - one is the "compiled"

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Matt Sicker
>From a distribution standpoint, the point of these policies to me has
been to emphasize that anything we distribute here at Apache can be
safely used and copied under the terms of the Apache License. As such,
source releases have always been the target, though over time, Apache
has accumulated several end-user type projects that may or may not
have a developer audience that knows what to do with source code. The
binary distributions become a useful channel for projects so that
users can actually use the project without technical knowledge of
development environment setups and such. This raises a conundrum,
though, that nearly any non-trivial binary software artifact will
contain or link to code that is not distributed under the Apache
License, but it may be compatible (e.g., GPLv3 is compatible with
ALv2, but combining the two results in GPLv3 basically, not
ALv2+GPLv3; this doesn't change existing licenses of course). For our
end users downloading Apache artifacts, we've had a history of
publishing IP-safe source code that is easily used under the ALv2. I
think the historical problem behind why binary artifacts haven't been
raised to the same status involves clarifying the line between where
our artifacts end and a third party's begin. This is especially
apparent in languages where the reference implementation runtime is
GPL (e.g., OpenJDK, though that itself has an interesting history due
to Apache Harmony having been a thing at one point).

>From a security standpoint, distributing binaries requires more
infrastructural security to respond to potential malware infections,
CVEs in dependencies, etc.


On Mon, 14 Sep 2020 at 10:54, Jarek Potiuk  wrote:
>
> Oh yeah. I start realizing now how herculean it is :). No worries, I am
> afraid when you are back, the discussion will be just warming up :).
>
> Speaking of the "double standard" - the main reason really comes from
> licensing. When you compile something in that is GPL, your code starts to
> be bound by the licence. But when you just bundle it together in a software
> package - you are not.
>
> So this is pretty much unavoidable to apply different rules to those
> situations. No matter what - we have to make this distinction IMHO. But
> let's see what others say on that.  I'd love to hear your thought on that,
> before you head out.
>
> J
>
>
> On Mon, Sep 14, 2020 at 5:47 PM Joan Touzet  wrote:
>
> > Hi Jarek,
> >
> > I'm about to head out for 3 weeks, so I'm going to miss most of this
> > discussion. I've done my best to leave comments in your document, but
> > just picking out one topic in this thread:
> >
> > On 14/09/2020 02:40, Jarek Potiuk wrote:
> > > Yeah - I see the point and to be honest, that was exactly my original
> > > intention when I wrote the proposal. I modified it slightly to reflect
> > that
> > > - I think now after preparing the proposal that the "gist" of it is
> > really
> > > to introduce two kinds of convenience packages - one is the "compiled"
> > > package (which should be far more restricted what it contains due to
> > > limitations of licences such as GPL) and the other is simply "packaged"
> > > software - where we put independent software or binaries in a single
> > > "convenience" package but it does not have as far-reaching
> > > legal/licence consequences as compiled packages.
> > >
> > > The criteria I proposed introduce an interesting concept - the recursive
> > > definition of "official" packages - that was the most "difficult" part
> > > to come up with. But I believe as long as the criteria we come up with
> > can
> > > be recursively applied to any binaries or reference to those binaries up
> > to
> > > the end of the recursive chain of dependencies and as long as we provide
> > > instructions on how to build those binaries by the "power" users, I
> > believe
> > > it should be perfectly fine to include such binaries in "packaged"
> > software
> > > without explicitly releasing all the sources for them.
> > >
> > > So I tried to put it in the way to make it clear that the original
> > > limitations remain in place for the "compiled" package (effectively I am
> > > not changing any wording in the policy regarding those) but I (hope) make
> > > it clear that other limitations and criteria apply to "packaged" software
> > > using those modern tools like Docker/Helm but also any form of
> > installable
> > > packages (like Windows installers). I've also specifically listed the
> > > "windows installers" as an example package.
> >
> > I don't like the double standard of "compiled" vs. "packaged" software.
> > It's hard to understand when to apply which, and creates an un-level
> > playing field. Not every ASF project can create both, and you're using a
> > different ruler for each. I realize it was your intent to avoid clouding
> > the water, and to apply stricter rules to one vs. the other, but I feel
> > this is just continuing the double-standard I previously mentioned,
> > albeit in a different form.
> >
> > G

Re: Controlling the images used for the builds/releases

2020-09-14 Thread Joan Touzet

On 14/09/2020 11:54, Jarek Potiuk wrote:

Oh yeah. I start realizing now how herculean it is :). No worries, I am
afraid when you are back, the discussion will be just warming up :).

Speaking of the "double standard" - the main reason really comes from
licensing. When you compile something in that is GPL, your code starts to
be bound by the licence. But when you just bundle it together in a software
package - you are not.

So this is pretty much unavoidable to apply different rules to those
situations. No matter what - we have to make this distinction IMHO. But
let's see what others say on that.  I'd love to hear your thought on that,
before you head out.


Taking CouchDB, shipping *just* the compiled .beam files is possible but 
helps no one because they require the functional Erlang interpreter 
alongside them. In other words, it is not a runnable asset.


I believe you can compile Erlang against 100% non-GPL assets, but this 
is not common. How many people don't use gnulibc on Linux?


Thus, double standard, allowing access to "binary packages" only for 
those languages where the compiled asset is, on its own, sufficient to 
run the program. This is not even true for e.g. Node.JS or Python, any 
time there would be (potentially GNU) libc bindings.



J


On Mon, Sep 14, 2020 at 5:47 PM Joan Touzet  wrote:


Hi Jarek,

I'm about to head out for 3 weeks, so I'm going to miss most of this
discussion. I've done my best to leave comments in your document, but
just picking out one topic in this thread:

On 14/09/2020 02:40, Jarek Potiuk wrote:

Yeah - I see the point and to be honest, that was exactly my original
intention when I wrote the proposal. I modified it slightly to reflect

that

- I think now after preparing the proposal that the "gist" of it is

really

to introduce two kinds of convenience packages - one is the "compiled"
package (which should be far more restricted what it contains due to
limitations of licences such as GPL) and the other is simply "packaged"
software - where we put independent software or binaries in a single
"convenience" package but it does not have as far-reaching
legal/licence consequences as compiled packages.

The criteria I proposed introduce an interesting concept - the recursive
definition of "official" packages - that was the most "difficult" part
to come up with. But I believe as long as the criteria we come up with

can

be recursively applied to any binaries or reference to those binaries up

to

the end of the recursive chain of dependencies and as long as we provide
instructions on how to build those binaries by the "power" users, I

believe

it should be perfectly fine to include such binaries in "packaged"

software

without explicitly releasing all the sources for them.

So I tried to put it in the way to make it clear that the original
limitations remain in place for the "compiled" package (effectively I am
not changing any wording in the policy regarding those) but I (hope) make
it clear that other limitations and criteria apply to "packaged" software
using those modern tools like Docker/Helm but also any form of

installable

packages (like Windows installers). I've also specifically listed the
"windows installers" as an example package.


I don't like the double standard of "compiled" vs. "packaged" software.
It's hard to understand when to apply which, and creates an un-level
playing field. Not every ASF project can create both, and you're using a
different ruler for each. I realize it was your intent to avoid clouding
the water, and to apply stricter rules to one vs. the other, but I feel
this is just continuing the double-standard I previously mentioned,
albeit in a different form.

Good luck with the effort, and thanks for taking on this herculean task.

-Joan



J.


On Mon, Sep 14, 2020 at 2:57 AM Allen Wittenauer
 wrote:





On Sep 13, 2020, at 2:55 PM, Joan Touzet  wrote:

I think that any release of ASF software must have corresponding

sources

that can be use to generate those from. Even if there are some binary
files, those too should be generated from some kind of sources or
"officially released" binaries that come from some sources. I'd love

to

get

some more concrete examples of where it is not possible.


Sure, this is totally possible. I'm just saying that the amount of

source is extreme in the case where you're talking about a desktop app

that

runs in Java or Electron (Chrome as a desktop app), as two examples.


... and mostly impossible when talking about Windows containers.











Re: Controlling the images used for the builds/releases

2020-09-14 Thread Jarek Potiuk
Oh yeah. I start realizing now how herculean it is :). No worries, I am
afraid when you are back, the discussion will be just warming up :).

Speaking of the "double standard" - the main reason really comes from
licensing. When you compile something in that is GPL, your code starts to
be bound by the licence. But when you just bundle it together in a software
package - you are not.

So this is pretty much unavoidable to apply different rules to those
situations. No matter what - we have to make this distinction IMHO. But
let's see what others say on that.  I'd love to hear your thought on that,
before you head out.

J


On Mon, Sep 14, 2020 at 5:47 PM Joan Touzet  wrote:

> Hi Jarek,
>
> I'm about to head out for 3 weeks, so I'm going to miss most of this
> discussion. I've done my best to leave comments in your document, but
> just picking out one topic in this thread:
>
> On 14/09/2020 02:40, Jarek Potiuk wrote:
> > Yeah - I see the point and to be honest, that was exactly my original
> > intention when I wrote the proposal. I modified it slightly to reflect
> that
> > - I think now after preparing the proposal that the "gist" of it is
> really
> > to introduce two kinds of convenience packages - one is the "compiled"
> > package (which should be far more restricted what it contains due to
> > limitations of licences such as GPL) and the other is simply "packaged"
> > software - where we put independent software or binaries in a single
> > "convenience" package but it does not have as far-reaching
> > legal/licence consequences as compiled packages.
> >
> > The criteria I proposed introduce an interesting concept - the recursive
> > definition of "official" packages - that was the most "difficult" part
> > to come up with. But I believe as long as the criteria we come up with
> can
> > be recursively applied to any binaries or reference to those binaries up
> to
> > the end of the recursive chain of dependencies and as long as we provide
> > instructions on how to build those binaries by the "power" users, I
> believe
> > it should be perfectly fine to include such binaries in "packaged"
> software
> > without explicitly releasing all the sources for them.
> >
> > So I tried to put it in the way to make it clear that the original
> > limitations remain in place for the "compiled" package (effectively I am
> > not changing any wording in the policy regarding those) but I (hope) make
> > it clear that other limitations and criteria apply to "packaged" software
> > using those modern tools like Docker/Helm but also any form of
> installable
> > packages (like Windows installers). I've also specifically listed the
> > "windows installers" as an example package.
>
> I don't like the double standard of "compiled" vs. "packaged" software.
> It's hard to understand when to apply which, and creates an un-level
> playing field. Not every ASF project can create both, and you're using a
> different ruler for each. I realize it was your intent to avoid clouding
> the water, and to apply stricter rules to one vs. the other, but I feel
> this is just continuing the double-standard I previously mentioned,
> albeit in a different form.
>
> Good luck with the effort, and thanks for taking on this herculean task.
>
> -Joan
>
> >
> > J.
> >
> >
> > On Mon, Sep 14, 2020 at 2:57 AM Allen Wittenauer
> >  wrote:
> >
> >>
> >>
> >>> On Sep 13, 2020, at 2:55 PM, Joan Touzet  wrote:
>  I think that any release of ASF software must have corresponding
> sources
>  that can be use to generate those from. Even if there are some binary
>  files, those too should be generated from some kind of sources or
>  "officially released" binaries that come from some sources. I'd love
> to
> >> get
>  some more concrete examples of where it is not possible.
> >>>
> >>> Sure, this is totally possible. I'm just saying that the amount of
> >> source is extreme in the case where you're talking about a desktop app
> that
> >> runs in Java or Electron (Chrome as a desktop app), as two examples.
> >>
> >>
> >> ... and mostly impossible when talking about Windows containers.
> >>
> >>
> >
>


-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: Controlling the images used for the builds/releases

2020-09-14 Thread Joan Touzet

Hi Jarek,

I'm about to head out for 3 weeks, so I'm going to miss most of this 
discussion. I've done my best to leave comments in your document, but 
just picking out one topic in this thread:


On 14/09/2020 02:40, Jarek Potiuk wrote:

Yeah - I see the point and to be honest, that was exactly my original
intention when I wrote the proposal. I modified it slightly to reflect that
- I think now after preparing the proposal that the "gist" of it is really
to introduce two kinds of convenience packages - one is the "compiled"
package (which should be far more restricted what it contains due to
limitations of licences such as GPL) and the other is simply "packaged"
software - where we put independent software or binaries in a single
"convenience" package but it does not have as far-reaching
legal/licence consequences as compiled packages.

The criteria I proposed introduce an interesting concept - the recursive
definition of "official" packages - that was the most "difficult" part
to come up with. But I believe as long as the criteria we come up with can
be recursively applied to any binaries or reference to those binaries up to
the end of the recursive chain of dependencies and as long as we provide
instructions on how to build those binaries by the "power" users, I believe
it should be perfectly fine to include such binaries in "packaged" software
without explicitly releasing all the sources for them.

So I tried to put it in the way to make it clear that the original
limitations remain in place for the "compiled" package (effectively I am
not changing any wording in the policy regarding those) but I (hope) make
it clear that other limitations and criteria apply to "packaged" software
using those modern tools like Docker/Helm but also any form of installable
packages (like Windows installers). I've also specifically listed the
"windows installers" as an example package.


I don't like the double standard of "compiled" vs. "packaged" software. 
It's hard to understand when to apply which, and creates an un-level 
playing field. Not every ASF project can create both, and you're using a 
different ruler for each. I realize it was your intent to avoid clouding 
the water, and to apply stricter rules to one vs. the other, but I feel 
this is just continuing the double-standard I previously mentioned, 
albeit in a different form.


Good luck with the effort, and thanks for taking on this herculean task.

-Joan



J.


On Mon, Sep 14, 2020 at 2:57 AM Allen Wittenauer
 wrote:





On Sep 13, 2020, at 2:55 PM, Joan Touzet  wrote:

I think that any release of ASF software must have corresponding sources
that can be use to generate those from. Even if there are some binary
files, those too should be generated from some kind of sources or
"officially released" binaries that come from some sources. I'd love to

get

some more concrete examples of where it is not possible.


Sure, this is totally possible. I'm just saying that the amount of

source is extreme in the case where you're talking about a desktop app that
runs in Java or Electron (Chrome as a desktop app), as two examples.


... and mostly impossible when talking about Windows containers.






Re: Controlling the images used for the builds/releases

2020-09-13 Thread Jarek Potiuk
Yeah - I see the point and to be honest, that was exactly my original
intention when I wrote the proposal. I modified it slightly to reflect that
- I think now after preparing the proposal that the "gist" of it is really
to introduce two kinds of convenience packages - one is the "compiled"
package (which should be far more restricted what it contains due to
limitations of licences such as GPL) and the other is simply "packaged"
software - where we put independent software or binaries in a single
"convenience" package but it does not have as far-reaching
legal/licence consequences as compiled packages.

The criteria I proposed introduce an interesting concept - the recursive
definition of "official" packages - that was the most "difficult" part
to come up with. But I believe as long as the criteria we come up with can
be recursively applied to any binaries or reference to those binaries up to
the end of the recursive chain of dependencies and as long as we provide
instructions on how to build those binaries by the "power" users, I believe
it should be perfectly fine to include such binaries in "packaged" software
without explicitly releasing all the sources for them.

So I tried to put it in the way to make it clear that the original
limitations remain in place for the "compiled" package (effectively I am
not changing any wording in the policy regarding those) but I (hope) make
it clear that other limitations and criteria apply to "packaged" software
using those modern tools like Docker/Helm but also any form of installable
packages (like Windows installers). I've also specifically listed the
"windows installers" as an example package.

J.


On Mon, Sep 14, 2020 at 2:57 AM Allen Wittenauer
 wrote:

>
>
> > On Sep 13, 2020, at 2:55 PM, Joan Touzet  wrote:
> >> I think that any release of ASF software must have corresponding sources
> >> that can be use to generate those from. Even if there are some binary
> >> files, those too should be generated from some kind of sources or
> >> "officially released" binaries that come from some sources. I'd love to
> get
> >> some more concrete examples of where it is not possible.
> >
> > Sure, this is totally possible. I'm just saying that the amount of
> source is extreme in the case where you're talking about a desktop app that
> runs in Java or Electron (Chrome as a desktop app), as two examples.
>
>
> ... and mostly impossible when talking about Windows containers.
>
>

-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: Controlling the images used for the builds/releases

2020-09-13 Thread Allen Wittenauer



> On Sep 13, 2020, at 2:55 PM, Joan Touzet  wrote:
>> I think that any release of ASF software must have corresponding sources
>> that can be use to generate those from. Even if there are some binary
>> files, those too should be generated from some kind of sources or
>> "officially released" binaries that come from some sources. I'd love to get
>> some more concrete examples of where it is not possible.
> 
> Sure, this is totally possible. I'm just saying that the amount of source is 
> extreme in the case where you're talking about a desktop app that runs in 
> Java or Electron (Chrome as a desktop app), as two examples.


... and mostly impossible when talking about Windows containers.



Re: Controlling the images used for the builds/releases

2020-09-13 Thread Joan Touzet

On 2020-09-13 5:19 p.m., Jarek Potiuk wrote:

Can you please make an inline comment in the document? the Cwiki allows
inline comments, just select a paragraph and comment it there.  This is the
easiest way to keep it focused in the document. I  am not sure if
understand the Open-Office specific things, i'd love to understand that
though (I used Open-Office for years) :)


Done, and I expanded on this point.


I think that any release of ASF software must have corresponding sources
that can be use to generate those from. Even if there are some binary
files, those too should be generated from some kind of sources or
"officially released" binaries that come from some sources. I'd love to get
some more concrete examples of where it is not possible.


Sure, this is totally possible. I'm just saying that the amount of 
source is extreme in the case where you're talking about a desktop app 
that runs in Java or Electron (Chrome as a desktop app), as two examples.


-Joan


J.

On Sun, Sep 13, 2020 at 11:09 PM Joan Touzet  wrote:


HI Jarek,

Can you comment on one specific thing? In Proposal 1 you still leave the
text "...MUST only add binary/bytecode files". This is not possible for
convenience packages in many situations - for instance OpenOffice or
other languages - where providing a full release of a product requires a
language runtime. It has always bothered me that this text effectively
prevents redistribution of binary assets in the packages that are not
strictly speaking derived from the source code.

As you go far beyond this with the container packaging in Proposal 2, I
believe Proposal 1 needs to be modified to match. In my opinion a
suitable replacement would be something like:

"In all such cases...version number as the source release, as MUST
include only the binary/bytecode files that are necessary, via the
compiling and packaging of that source code release and its
dependencies, to produce a functional deliverable. All instructions..."

-Joan

On 2020-09-13 4:40 p.m., Jarek Potiuk wrote:

Just for your information - after a discussion in the ComDev mailing

list.

I created a proposal for Apache Software Foundation to introduce changes

to

the "ASF release policies", to make it clear and straightforward to

release

"convenience packages" in the form of "software packaging" (such as Helm
Charts and Container Images) rather than "compiled packages" as

recognised

so far by the ASF policies.

The proposal is here:


https://cwiki.apache.org/confluence/display/COMDEV/Updates+of+policies+for+the+convenience+packages


The discussion in the ComDev ASF mailing list is here:


https://lists.apache.org/thread.html/r49c3ef0a8423664c564c0c2719056662021f03b5678ef5b249892c10%40%3Cdev.community.apache.org%3E


We are going to discuss it and propose to the ASF board to vote on the
changes.

I look forward to all comments and I hope it can pave the way for the ASF
to provide a coherent approach for releasing Container Images, Helm

Charts

for all ASF projects.

On Mon, Aug 31, 2020 at 9:23 PM Jarek Potiuk 
wrote:


Just to revive this thread and let you know what we've done in Airflow.

We merged changes to our repository that allow our users to rebuild all
images if they need to -using official sources. It's not very involved

and

not a lot of code to maintain:
https://github.com/apache/airflow/pull/9650/
Next time when we release Airflow Sources including the Helm Chart, any

of

our users will be able to rebuild all the images used in charts from the
ASF-released source package.

The whole discussion ended up to be not about the Licence, but about the
content of the official ASF source package release.

I personally think this is the only way to fulfill this chapter from ASF
release policy:


http://www.apache.org/legal/release-policy.html#what-must-every-release-contain


Every ASF release must contain a source package, which must be

sufficient

for a user to build and test the release provided they have access to

the

appropriate platform and tools.



I would love to hear other thoughts about it.

J.




On Tue, Jun 23, 2020 at 11:42 PM Roman Shaposhnik 


wrote:


On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk 


wrote:




My understanding the bigger problem is the license of the dependency

(and

their dependencies) rather than the official/unofficial status.  For

Apache

Yetus' test-patch functionality, we defaulted all of our plugins to

off

because we couldn't depend upon GPL'd binaries being available or

giving

the impression that they were required.  By doing so, it put the onus

on

the user to specifically enable features that depends upon GPL'd
functionality.  It also pretty much nukes any idea of being user

friendly.

:(



Indeed - Licensing is important, especially for source code

redistribution.

We used to have some GPL-install-on-your-own-if-you-want in the past

but

those dependencies are gone already.





2) If it's not - how do we determine which images are "officially
main

Re: Controlling the images used for the builds/releases

2020-09-13 Thread Jarek Potiuk
Can you please make an inline comment in the document? the Cwiki allows
inline comments, just select a paragraph and comment it there.  This is the
easiest way to keep it focused in the document. I  am not sure if
understand the Open-Office specific things, i'd love to understand that
though (I used Open-Office for years) :)

I think that any release of ASF software must have corresponding sources
that can be use to generate those from. Even if there are some binary
files, those too should be generated from some kind of sources or
"officially released" binaries that come from some sources. I'd love to get
some more concrete examples of where it is not possible.

J.

On Sun, Sep 13, 2020 at 11:09 PM Joan Touzet  wrote:

> HI Jarek,
>
> Can you comment on one specific thing? In Proposal 1 you still leave the
> text "...MUST only add binary/bytecode files". This is not possible for
> convenience packages in many situations - for instance OpenOffice or
> other languages - where providing a full release of a product requires a
> language runtime. It has always bothered me that this text effectively
> prevents redistribution of binary assets in the packages that are not
> strictly speaking derived from the source code.
>
> As you go far beyond this with the container packaging in Proposal 2, I
> believe Proposal 1 needs to be modified to match. In my opinion a
> suitable replacement would be something like:
>
> "In all such cases...version number as the source release, as MUST
> include only the binary/bytecode files that are necessary, via the
> compiling and packaging of that source code release and its
> dependencies, to produce a functional deliverable. All instructions..."
>
> -Joan
>
> On 2020-09-13 4:40 p.m., Jarek Potiuk wrote:
> > Just for your information - after a discussion in the ComDev mailing
> list.
> > I created a proposal for Apache Software Foundation to introduce changes
> to
> > the "ASF release policies", to make it clear and straightforward to
> release
> > "convenience packages" in the form of "software packaging" (such as Helm
> > Charts and Container Images) rather than "compiled packages" as
> recognised
> > so far by the ASF policies.
> >
> > The proposal is here:
> >
> https://cwiki.apache.org/confluence/display/COMDEV/Updates+of+policies+for+the+convenience+packages
> >
> > The discussion in the ComDev ASF mailing list is here:
> >
> https://lists.apache.org/thread.html/r49c3ef0a8423664c564c0c2719056662021f03b5678ef5b249892c10%40%3Cdev.community.apache.org%3E
> >
> > We are going to discuss it and propose to the ASF board to vote on the
> > changes.
> >
> > I look forward to all comments and I hope it can pave the way for the ASF
> > to provide a coherent approach for releasing Container Images, Helm
> Charts
> > for all ASF projects.
> >
> > On Mon, Aug 31, 2020 at 9:23 PM Jarek Potiuk 
> > wrote:
> >
> >> Just to revive this thread and let you know what we've done in Airflow.
> >>
> >> We merged changes to our repository that allow our users to rebuild all
> >> images if they need to -using official sources. It's not very involved
> and
> >> not a lot of code to maintain:
> >> https://github.com/apache/airflow/pull/9650/
> >> Next time when we release Airflow Sources including the Helm Chart, any
> of
> >> our users will be able to rebuild all the images used in charts from the
> >> ASF-released source package.
> >>
> >> The whole discussion ended up to be not about the Licence, but about the
> >> content of the official ASF source package release.
> >>
> >> I personally think this is the only way to fulfill this chapter from ASF
> >> release policy:
> >>
> http://www.apache.org/legal/release-policy.html#what-must-every-release-contain
> >>
> >> Every ASF release must contain a source package, which must be
> sufficient
> >>> for a user to build and test the release provided they have access to
> the
> >>> appropriate platform and tools.
> >>
> >>
> >> I would love to hear other thoughts about it.
> >>
> >> J.
> >>
> >>
> >>
> >>
> >> On Tue, Jun 23, 2020 at 11:42 PM Roman Shaposhnik  >
> >> wrote:
> >>
> >>> On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk  >
> >>> wrote:
> 
> >
> > My understanding the bigger problem is the license of the dependency
> >>> (and
> > their dependencies) rather than the official/unofficial status.  For
> >>> Apache
> > Yetus' test-patch functionality, we defaulted all of our plugins to
> >>> off
> > because we couldn't depend upon GPL'd binaries being available or
> >>> giving
> > the impression that they were required.  By doing so, it put the onus
> >>> on
> > the user to specifically enable features that depends upon GPL'd
> > functionality.  It also pretty much nukes any idea of being user
> >>> friendly.
> > :(
> >
> 
>  Indeed - Licensing is important, especially for source code
> >>> redistribution.
>  We used to have some GPL-install-on-your-own-if-you-want in the past
> but
>  those de

Re: Controlling the images used for the builds/releases

2020-09-13 Thread Joan Touzet

HI Jarek,

Can you comment on one specific thing? In Proposal 1 you still leave the 
text "...MUST only add binary/bytecode files". This is not possible for 
convenience packages in many situations - for instance OpenOffice or 
other languages - where providing a full release of a product requires a 
language runtime. It has always bothered me that this text effectively 
prevents redistribution of binary assets in the packages that are not 
strictly speaking derived from the source code.


As you go far beyond this with the container packaging in Proposal 2, I 
believe Proposal 1 needs to be modified to match. In my opinion a 
suitable replacement would be something like:


"In all such cases...version number as the source release, as MUST 
include only the binary/bytecode files that are necessary, via the 
compiling and packaging of that source code release and its 
dependencies, to produce a functional deliverable. All instructions..."


-Joan

On 2020-09-13 4:40 p.m., Jarek Potiuk wrote:

Just for your information - after a discussion in the ComDev mailing list.
I created a proposal for Apache Software Foundation to introduce changes to
the "ASF release policies", to make it clear and straightforward to release
"convenience packages" in the form of "software packaging" (such as Helm
Charts and Container Images) rather than "compiled packages" as recognised
so far by the ASF policies.

The proposal is here:
https://cwiki.apache.org/confluence/display/COMDEV/Updates+of+policies+for+the+convenience+packages

The discussion in the ComDev ASF mailing list is here:
https://lists.apache.org/thread.html/r49c3ef0a8423664c564c0c2719056662021f03b5678ef5b249892c10%40%3Cdev.community.apache.org%3E

We are going to discuss it and propose to the ASF board to vote on the
changes.

I look forward to all comments and I hope it can pave the way for the ASF
to provide a coherent approach for releasing Container Images, Helm Charts
for all ASF projects.

On Mon, Aug 31, 2020 at 9:23 PM Jarek Potiuk 
wrote:


Just to revive this thread and let you know what we've done in Airflow.

We merged changes to our repository that allow our users to rebuild all
images if they need to -using official sources. It's not very involved and
not a lot of code to maintain:
https://github.com/apache/airflow/pull/9650/
Next time when we release Airflow Sources including the Helm Chart, any of
our users will be able to rebuild all the images used in charts from the
ASF-released source package.

The whole discussion ended up to be not about the Licence, but about the
content of the official ASF source package release.

I personally think this is the only way to fulfill this chapter from ASF
release policy:
http://www.apache.org/legal/release-policy.html#what-must-every-release-contain

Every ASF release must contain a source package, which must be sufficient

for a user to build and test the release provided they have access to the
appropriate platform and tools.



I would love to hear other thoughts about it.

J.




On Tue, Jun 23, 2020 at 11:42 PM Roman Shaposhnik 
wrote:


On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk 
wrote:




My understanding the bigger problem is the license of the dependency

(and

their dependencies) rather than the official/unofficial status.  For

Apache

Yetus' test-patch functionality, we defaulted all of our plugins to

off

because we couldn't depend upon GPL'd binaries being available or

giving

the impression that they were required.  By doing so, it put the onus

on

the user to specifically enable features that depends upon GPL'd
functionality.  It also pretty much nukes any idea of being user

friendly.

:(



Indeed - Licensing is important, especially for source code

redistribution.

We used to have some GPL-install-on-your-own-if-you-want in the past but
those dependencies are gone already.





2) If it's not - how do we determine which images are "officially
maintained".


 Keep in mind that Docker themselves brand their images as
'official' when they actually come from Docker instead of the

organizations

that own that particular piece of software.  It just adds to the

complexity.




Not really. We actually plan to make our own Apache Airflow Docker

image as

official one. Docker has very clear guidelines on how to make images
"official" and it https://docs.docker.com/docker-hub/official_images/

and

there is quite a long iist of those:
https://github.com/docker-library/official-images/tree/master/library -
most of them maintained by the "authirs" of the image. Docker has a
dedicated team that reviews, checks those images and they encourage that
the "authors" maintain them. Quote from Docker's docs: "While it is
preferable to have upstream software authors maintaining their
corresponding Official Images, this is not a strict requirement."




3) If yes - how do we put the boundary - when image is acceptable?

Are

there any criteria we can use or/ constraints we can put on the
licences/

Re: Controlling the images used for the builds/releases

2020-09-13 Thread Jarek Potiuk
Just for your information - after a discussion in the ComDev mailing list.
I created a proposal for Apache Software Foundation to introduce changes to
the "ASF release policies", to make it clear and straightforward to release
"convenience packages" in the form of "software packaging" (such as Helm
Charts and Container Images) rather than "compiled packages" as recognised
so far by the ASF policies.

The proposal is here:
https://cwiki.apache.org/confluence/display/COMDEV/Updates+of+policies+for+the+convenience+packages

The discussion in the ComDev ASF mailing list is here:
https://lists.apache.org/thread.html/r49c3ef0a8423664c564c0c2719056662021f03b5678ef5b249892c10%40%3Cdev.community.apache.org%3E

We are going to discuss it and propose to the ASF board to vote on the
changes.

I look forward to all comments and I hope it can pave the way for the ASF
to provide a coherent approach for releasing Container Images, Helm Charts
for all ASF projects.

On Mon, Aug 31, 2020 at 9:23 PM Jarek Potiuk 
wrote:

> Just to revive this thread and let you know what we've done in Airflow.
>
> We merged changes to our repository that allow our users to rebuild all
> images if they need to -using official sources. It's not very involved and
> not a lot of code to maintain:
> https://github.com/apache/airflow/pull/9650/
> Next time when we release Airflow Sources including the Helm Chart, any of
> our users will be able to rebuild all the images used in charts from the
> ASF-released source package.
>
> The whole discussion ended up to be not about the Licence, but about the
> content of the official ASF source package release.
>
> I personally think this is the only way to fulfill this chapter from ASF
> release policy:
> http://www.apache.org/legal/release-policy.html#what-must-every-release-contain
>
> Every ASF release must contain a source package, which must be sufficient
>> for a user to build and test the release provided they have access to the
>> appropriate platform and tools.
>
>
> I would love to hear other thoughts about it.
>
> J.
>
>
>
>
> On Tue, Jun 23, 2020 at 11:42 PM Roman Shaposhnik 
> wrote:
>
>> On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk 
>> wrote:
>> >
>> > >
>> > > My understanding the bigger problem is the license of the dependency
>> (and
>> > > their dependencies) rather than the official/unofficial status.  For
>> Apache
>> > > Yetus' test-patch functionality, we defaulted all of our plugins to
>> off
>> > > because we couldn't depend upon GPL'd binaries being available or
>> giving
>> > > the impression that they were required.  By doing so, it put the onus
>> on
>> > > the user to specifically enable features that depends upon GPL'd
>> > > functionality.  It also pretty much nukes any idea of being user
>> friendly.
>> > > :(
>> > >
>> >
>> > Indeed - Licensing is important, especially for source code
>> redistribution.
>> > We used to have some GPL-install-on-your-own-if-you-want in the past but
>> > those dependencies are gone already.
>> >
>> >
>> > >
>> > > > 2) If it's not - how do we determine which images are "officially
>> > > > maintained".
>> > >
>> > > Keep in mind that Docker themselves brand their images as
>> > > 'official' when they actually come from Docker instead of the
>> organizations
>> > > that own that particular piece of software.  It just adds to the
>> complexity.
>> > >
>> >
>> > Not really. We actually plan to make our own Apache Airflow Docker
>> image as
>> > official one. Docker has very clear guidelines on how to make images
>> > "official" and it https://docs.docker.com/docker-hub/official_images/
>> and
>> > there is quite a long iist of those:
>> > https://github.com/docker-library/official-images/tree/master/library -
>> > most of them maintained by the "authirs" of the image. Docker has a
>> > dedicated team that reviews, checks those images and they encourage that
>> > the "authors" maintain them. Quote from Docker's docs: "While it is
>> > preferable to have upstream software authors maintaining their
>> > corresponding Official Images, this is not a strict requirement."
>> >
>> > >
>> > > > 3) If yes - how do we put the boundary - when image is acceptable?
>> Are
>> > > > there any criteria we can use or/ constraints we can put on the
>> > > > licences/organizations releasing the images we want to make
>> dependencies
>> > > > for released code of ours?
>> > >
>> > > License means everything.
>> > >
>> >
>> > For software distribution - true. It is the "blocker". But I think my
>> > question goes a bit beyond that - i.e. whether it's ok to
>> encourage/depend
>> > on the work maintained by other organizations than Apache if they are
>> not
>> > "official". My take is that it's likely OK to depend on that providing
>> that
>> > there is a kind of statement from those organizations that they
>> maintain it.
>> >
>> > An example risk I see:
>> > Airflow users depend heavily on helm chart to install Airflow - what
>> > happens if the comm

Re: Controlling the images used for the builds/releases

2020-08-31 Thread Jarek Potiuk
Just to revive this thread and let you know what we've done in Airflow.

We merged changes to our repository that allow our users to rebuild all
images if they need to -using official sources. It's not very involved and
not a lot of code to maintain:
https://github.com/apache/airflow/pull/9650/
Next time when we release Airflow Sources including the Helm Chart, any of
our users will be able to rebuild all the images used in charts from the
ASF-released source package.

The whole discussion ended up to be not about the Licence, but about the
content of the official ASF source package release.

I personally think this is the only way to fulfill this chapter from ASF
release policy:
http://www.apache.org/legal/release-policy.html#what-must-every-release-contain

Every ASF release must contain a source package, which must be sufficient
> for a user to build and test the release provided they have access to the
> appropriate platform and tools.


I would love to hear other thoughts about it.

J.




On Tue, Jun 23, 2020 at 11:42 PM Roman Shaposhnik 
wrote:

> On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk 
> wrote:
> >
> > >
> > > My understanding the bigger problem is the license of the dependency
> (and
> > > their dependencies) rather than the official/unofficial status.  For
> Apache
> > > Yetus' test-patch functionality, we defaulted all of our plugins to off
> > > because we couldn't depend upon GPL'd binaries being available or
> giving
> > > the impression that they were required.  By doing so, it put the onus
> on
> > > the user to specifically enable features that depends upon GPL'd
> > > functionality.  It also pretty much nukes any idea of being user
> friendly.
> > > :(
> > >
> >
> > Indeed - Licensing is important, especially for source code
> redistribution.
> > We used to have some GPL-install-on-your-own-if-you-want in the past but
> > those dependencies are gone already.
> >
> >
> > >
> > > > 2) If it's not - how do we determine which images are "officially
> > > > maintained".
> > >
> > > Keep in mind that Docker themselves brand their images as
> > > 'official' when they actually come from Docker instead of the
> organizations
> > > that own that particular piece of software.  It just adds to the
> complexity.
> > >
> >
> > Not really. We actually plan to make our own Apache Airflow Docker image
> as
> > official one. Docker has very clear guidelines on how to make images
> > "official" and it https://docs.docker.com/docker-hub/official_images/
> and
> > there is quite a long iist of those:
> > https://github.com/docker-library/official-images/tree/master/library -
> > most of them maintained by the "authirs" of the image. Docker has a
> > dedicated team that reviews, checks those images and they encourage that
> > the "authors" maintain them. Quote from Docker's docs: "While it is
> > preferable to have upstream software authors maintaining their
> > corresponding Official Images, this is not a strict requirement."
> >
> > >
> > > > 3) If yes - how do we put the boundary - when image is acceptable?
> Are
> > > > there any criteria we can use or/ constraints we can put on the
> > > > licences/organizations releasing the images we want to make
> dependencies
> > > > for released code of ours?
> > >
> > > License means everything.
> > >
> >
> > For software distribution - true. It is the "blocker". But I think my
> > question goes a bit beyond that - i.e. whether it's ok to
> encourage/depend
> > on the work maintained by other organizations than Apache if they are not
> > "official". My take is that it's likely OK to depend on that providing
> that
> > there is a kind of statement from those organizations that they maintain
> it.
> >
> > An example risk I see:
> > Airflow users depend heavily on helm chart to install Airflow - what
> > happens if the community agrees to implement something that the
> > organization does not want to implement (for whatever reason).
>
> FWIW: every corporation I ever worked at would commission a
> BlackDuck/Palamida
> report of a total software scans for its products. There was some
> amount of: "this
> ASF project pulls a dependency FOO (non ASF) that is declared to be
> licensed under
> the license X but it actually isn't -- here's why..."
>
> We trust our upstream dependencies, but I don't think we can verify
> them as a foundation.
> Hence we keep relying on that feedback coming from corporate sides.
>
> Thanks,
> Roman.
>
> > > > 4) If some images are not acceptable, shoud we bring them in and
> release
> > > > them in a community-managed registry?
> > >
> > > For the Apache Yetus docker image, we're including everything
> that
> > > the project supports.  *shrugs*
> > >
> >
> > Yeah. That's perfectly OK in many cases. Our docker image is also
> > self-containing. However, Airflow is a bit special here. Airflow is an
> > orchestrator which means that it can talk to many different services. We
> > have 58(!) "providers" - basically ext

Re: Controlling the images used for the builds/releases

2020-06-23 Thread Roman Shaposhnik
On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk  wrote:
>
> >
> > My understanding the bigger problem is the license of the dependency (and
> > their dependencies) rather than the official/unofficial status.  For Apache
> > Yetus' test-patch functionality, we defaulted all of our plugins to off
> > because we couldn't depend upon GPL'd binaries being available or giving
> > the impression that they were required.  By doing so, it put the onus on
> > the user to specifically enable features that depends upon GPL'd
> > functionality.  It also pretty much nukes any idea of being user friendly.
> > :(
> >
>
> Indeed - Licensing is important, especially for source code redistribution.
> We used to have some GPL-install-on-your-own-if-you-want in the past but
> those dependencies are gone already.
>
>
> >
> > > 2) If it's not - how do we determine which images are "officially
> > > maintained".
> >
> > Keep in mind that Docker themselves brand their images as
> > 'official' when they actually come from Docker instead of the organizations
> > that own that particular piece of software.  It just adds to the complexity.
> >
>
> Not really. We actually plan to make our own Apache Airflow Docker image as
> official one. Docker has very clear guidelines on how to make images
> "official" and it https://docs.docker.com/docker-hub/official_images/  and
> there is quite a long iist of those:
> https://github.com/docker-library/official-images/tree/master/library -
> most of them maintained by the "authirs" of the image. Docker has a
> dedicated team that reviews, checks those images and they encourage that
> the "authors" maintain them. Quote from Docker's docs: "While it is
> preferable to have upstream software authors maintaining their
> corresponding Official Images, this is not a strict requirement."
>
> >
> > > 3) If yes - how do we put the boundary - when image is acceptable? Are
> > > there any criteria we can use or/ constraints we can put on the
> > > licences/organizations releasing the images we want to make dependencies
> > > for released code of ours?
> >
> > License means everything.
> >
>
> For software distribution - true. It is the "blocker". But I think my
> question goes a bit beyond that - i.e. whether it's ok to encourage/depend
> on the work maintained by other organizations than Apache if they are not
> "official". My take is that it's likely OK to depend on that providing that
> there is a kind of statement from those organizations that they maintain it.
>
> An example risk I see:
> Airflow users depend heavily on helm chart to install Airflow - what
> happens if the community agrees to implement something that the
> organization does not want to implement (for whatever reason).

FWIW: every corporation I ever worked at would commission a BlackDuck/Palamida
report of a total software scans for its products. There was some
amount of: "this
ASF project pulls a dependency FOO (non ASF) that is declared to be
licensed under
the license X but it actually isn't -- here's why..."

We trust our upstream dependencies, but I don't think we can verify
them as a foundation.
Hence we keep relying on that feedback coming from corporate sides.

Thanks,
Roman.

> > > 4) If some images are not acceptable, shoud we bring them in and release
> > > them in a community-managed registry?
> >
> > For the Apache Yetus docker image, we're including everything that
> > the project supports.  *shrugs*
> >
>
> Yeah. That's perfectly OK in many cases. Our docker image is also
> self-containing. However, Airflow is a bit special here. Airflow is an
> orchestrator which means that it can talk to many different services. We
> have 58(!) "providers" - basically external services we can talk to. And
> many of those services require many dependencies - for example Cassandra
> (for production installation) requires cython-compiled driver (for
> performance) and it takes 10 minutes to build it. The smaller the images -
> the better - therefore the images we release contain the most "popular"
> providers rather than all of them, but the user can build their own image
> from the sources if they want and add those extra dependencies they need.
>
> Another problem is - helm chart uses - by definition - a collection of
> images - so we will always have some images that helm chart depends on
> (pgbouncer is a good example). So it cannot be really self-contained. We
> need to have dependencies, but the question is about "who controls them :)"
>
> J.
>
> --
>
> Jarek Potiuk
> Polidea  | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] 


Re: Controlling the images used for the builds/releases

2020-06-23 Thread Jarek Potiuk
>
> My understanding the bigger problem is the license of the dependency (and
> their dependencies) rather than the official/unofficial status.  For Apache
> Yetus' test-patch functionality, we defaulted all of our plugins to off
> because we couldn't depend upon GPL'd binaries being available or giving
> the impression that they were required.  By doing so, it put the onus on
> the user to specifically enable features that depends upon GPL'd
> functionality.  It also pretty much nukes any idea of being user friendly.
> :(
>

Indeed - Licensing is important, especially for source code redistribution.
We used to have some GPL-install-on-your-own-if-you-want in the past but
those dependencies are gone already.


>
> > 2) If it's not - how do we determine which images are "officially
> > maintained".
>
> Keep in mind that Docker themselves brand their images as
> 'official' when they actually come from Docker instead of the organizations
> that own that particular piece of software.  It just adds to the complexity.
>

Not really. We actually plan to make our own Apache Airflow Docker image as
official one. Docker has very clear guidelines on how to make images
"official" and it https://docs.docker.com/docker-hub/official_images/  and
there is quite a long iist of those:
https://github.com/docker-library/official-images/tree/master/library -
most of them maintained by the "authirs" of the image. Docker has a
dedicated team that reviews, checks those images and they encourage that
the "authors" maintain them. Quote from Docker's docs: "While it is
preferable to have upstream software authors maintaining their
corresponding Official Images, this is not a strict requirement."

>
> > 3) If yes - how do we put the boundary - when image is acceptable? Are
> > there any criteria we can use or/ constraints we can put on the
> > licences/organizations releasing the images we want to make dependencies
> > for released code of ours?
>
> License means everything.
>

For software distribution - true. It is the "blocker". But I think my
question goes a bit beyond that - i.e. whether it's ok to encourage/depend
on the work maintained by other organizations than Apache if they are not
"official". My take is that it's likely OK to depend on that providing that
there is a kind of statement from those organizations that they maintain it.

An example risk I see:
Airflow users depend heavily on helm chart to install Airflow - what
happens if the community agrees to implement something that the
organization does not want to implement (for whatever reason).


>
> > 4) If some images are not acceptable, shoud we bring them in and release
> > them in a community-managed registry?
>
> For the Apache Yetus docker image, we're including everything that
> the project supports.  *shrugs*
>

Yeah. That's perfectly OK in many cases. Our docker image is also
self-containing. However, Airflow is a bit special here. Airflow is an
orchestrator which means that it can talk to many different services. We
have 58(!) "providers" - basically external services we can talk to. And
many of those services require many dependencies - for example Cassandra
(for production installation) requires cython-compiled driver (for
performance) and it takes 10 minutes to build it. The smaller the images -
the better - therefore the images we release contain the most "popular"
providers rather than all of them, but the user can build their own image
from the sources if they want and add those extra dependencies they need.

Another problem is - helm chart uses - by definition - a collection of
images - so we will always have some images that helm chart depends on
(pgbouncer is a good example). So it cannot be really self-contained. We
need to have dependencies, but the question is about "who controls them :)"

J.

-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: Controlling the images used for the builds/releases

2020-06-22 Thread Allen Wittenauer


> On Jun 22, 2020, at 6:52 AM, Jarek Potiuk  wrote:
> 1) Is this acceptable to have a non-officially released image as a
> dependency in released code for the ASF project?

My understanding the bigger problem is the license of the dependency (and their 
dependencies) rather than the official/unofficial status.  For Apache Yetus' 
test-patch functionality, we defaulted all of our plugins to off because we 
couldn't depend upon GPL'd binaries being available or giving the impression 
that they were required.  By doing so, it put the onus on the user to 
specifically enable features that depends upon GPL'd functionality.  It also 
pretty much nukes any idea of being user friendly. :(

> 2) If it's not - how do we determine which images are "officially
> maintained".

Keep in mind that Docker themselves brand their images as 'official' 
when they actually come from Docker instead of the organizations that own that 
particular piece of software.  It just adds to the complexity.

> 3) If yes - how do we put the boundary - when image is acceptable? Are
> there any criteria we can use or/ constraints we can put on the
> licences/organizations releasing the images we want to make dependencies
> for released code of ours?

License means everything.

> 4) If some images are not acceptable, shoud we bring them in and release
> them in a community-managed registry?

For the Apache Yetus docker image, we're including everything that the 
project supports.  *shrugs*



Re: Controlling the images used for the builds/releases

2020-06-22 Thread Jarek Potiuk
> Sure. Build tools can even be GPL, and something like a linter isn't a
> hard dependency for Airflow anyway. +1
>

Indeed.

> But we are just about to start releasing Production Image and Helm Chart
> > for Apache Airflow and I started to wonder if this is still acceptable
> > practice when - by releasing the code - we make our users depend on those
> > images.
>
> Just checking: surely a production Airflow Docker image doesn't have
> hadolint in it?
>

Yes. It does not :). It's just for the earlier A) case.


>
> > We are going to officially support both - image and helm chart by the
> > community and once we release the image and helm chart officially, those
> > external images and downloads will become dependencies to our official
> > "releases". We are allowing our users to use our official Dockerfile
> > to build a new image (with user's configuration) and Helm Chart is going
> to
> > be officially available for anyone to install Airflow.
>
> Sounds like a good step for your project.
>
Indeed.


> First question: Is it the *only* way you can run Airflow? Does it end up
> in the source tarball? If so, you need to review the ASF licensing
> requirements and make sure you're not in violation there. (Just Checking!)
>

It's one of the ways. You don't *have to* use the helm chart or docker
image. We
have also official INSTALL instructions that simply install Airflow
directly from the
sources using just Python's pip dependencies (providing that you have all
the
required apt-deps installed.

And in the sources, we just have the names/references to the images (not
the images
themselves). And they are all released with liberal licenses when it comes
to using
them.


>
> Second: Most of these look like *testing* dependencies, not runtime
> dependencies.
>
>
Tru. The only "runtime" deps are the astronomer's one:

> - astronomerinc/ap-statsd-exporter:0.11.0
> > - astronomerinc/ap-pgbouncer:1.8.1
> > - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
>

How hard would it be for the Airflow community to import the Dockerfiles
> and build the images themselves? And keep those imported forks up to
> date? We do this a lot in CouchDB for our dependencies (not just Docker)
> where it's a personal project of someone in the community, or even where
> it's some corporate thing that we want to be sure we don't break on when
> they implement a change for their own reasons.
>
>
Not hard. I think it would be rather an easy task and we automate
everything -
including building and testing our own images (production and CI ones for
every Pull request). I even created the description on how to build a robust
setup where Github Actions and Dockerhub work together and images are build
and cached in Github Registry but then published nightly to DockerHub
(The Infra team asked me to do that when I shared it with them).
https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub
So we are fully capable of doing it.


> Automating building these and pushing them isn't hard these days, even
> on ASF hardware if you want. The nice thing about Docker is that, for
> you to do that, you really only need "docker build" (or "docker buildx"
> for cross-platform) and a build machine or two to keep things current.


Indeed. We use Github Actions and DockerHub Build integration so that would
be
rather easy.


>
> > 4) If some images are not acceptable, shoud we bring them in and release
> > them in a community-managed registry?
>
> I don't think you need a dedicated registry, but I would recommend
> setting up your own Docker Hub user and pushing at least CI images you
> need there. (We have the couchdbdev user, for instance, images we keep
> up to date with all of our build/test dependencies for Jenkins use.) And
> of course there's a bunch of images under
> https://hub.docker.com/u/apache for many ASF projects at this point.
>

Yeah. We have our own Dockerhub Airflow account where we publish (following
the CI process above) our CI And Production images nightly.
https://hub.docker.com/repository/docker/apache/airflow
And indeed I thought about something separate like your couchdbdev user,
but thought about making it under "apache" umbrella. However you are quite
right that we likely do not have to have an "apache" managed account for
that,
we could easily have our own and then it would make it easier to have
multiple
repositories under apachedev user for example. I think that is going to be
our setup
eventually for dev dependencies.


> For runtime dependency "sidecars" for Helm and other Docker images, I
> don't have a strong opinion. If they're essential to bring-up for
> Airflow, I'd encourage you to bring them in-project and re-build them
> yourselves.


They are quite important for the helm chart installation method. In this
case it's
the pgbouncer and statsd deamon. The first one is there to limit number of
postgres
connections opened to the database, the second to monitor metrics of a
running instance.
Both real

Re: Controlling the images used for the builds/releases

2020-06-22 Thread Joan Touzet
Hey Jarek, thanks for starting this thread. It's a thorny issue, for 
sure, especially because binary releases are not "official" from an ASF 
perspective.


(Of course, this is a technicality; the fact that your PMC is building 
these and linking them from project pages, and/or publishing them out as 
apache/ or top-level  at Docker Hub can be seen as a 
kind of officiality. It's just, for the moment, not an Official Act of 
the Foundation for legal reasons.)


On 22/06/2020 09:52, Jarek Potiuk wrote:

Hello Everyone,

I have a kind question and request for your opinions about using external
Docker images and downloaded binaries in the official releases for Apache
Airflow.

The question is: How much can we rely on those images being available in
those particular cases:

A) during static checks
B) during unit tests
C) for building production images for Airflow
D) for releasing production Helm Chart for Airflow

Some more explanation:

For a long time we are doing A) and B) in Apache Airflow and we followed a
practice that when we found an image that is goo for us and seems "legit"
we are using it. Example -
https://hub.docker.com/r/hadolint/hadolint/dockerfile/ - HadoLint image to
check our Dockerfiles.  Since this is easy to change pretty much
immediately, and only used for building/testing, I have no problem with
this, personally and I think it saves a lot of time and effort to maintain
some of those images.


Sure. Build tools can even be GPL, and something like a linter isn't a 
hard dependency for Airflow anyway. +1



But we are just about to start releasing Production Image and Helm Chart
for Apache Airflow and I started to wonder if this is still acceptable
practice when - by releasing the code - we make our users depend on those
images.


Just checking: surely a production Airflow Docker image doesn't have 
hadolint in it?



We are going to officially support both - image and helm chart by the
community and once we release the image and helm chart officially, those
external images and downloads will become dependencies to our official
"releases". We are allowing our users to use our official Dockerfile
to build a new image (with user's configuration) and Helm Chart is going to
be officially available for anyone to install Airflow.


Sounds like a good step for your project.


The Docker images that we are using are from various sources:

1) officially maintained images (Python, KinD, Postgres, MySQL for example)
2) images released by organizations that released them for their own
purpose, but they are not "officially maintained" by those organizations
3) images released by private individuals

While 1) is perfectly OK for both image and helm chart, I think for 2) and
3) we should bring the images to Airflow community management.


I agree, and would go a step further, see below.


Here is the list of those images I found that we use:

- aneeshkj/helm-unittest
- ashb/apache-rat:0.13-1
- godatadriven/krb5-kdc-server
- polinux/stress (?)
- osixia/openldap:1.2.0
- astronomerinc/ap-statsd-exporter:0.11.0
- astronomerinc/ap-pgbouncer:1.8.1
- astronomerinc/ap-pgbouncer-exporter:0.5.0-1

Some of those images are released by organizations that are strong
stakeholders in the project (Astronomer especially). Some other images are
by organizations that are still part of the community but not as strong
stakeholders (GoDataDriven) - some others are by private individuals who
are contributors (Ash, Aneesh) and some others are not-at-all connected to
Apache Airflow (polinux, osixia).

For me quite clearly - we are ok to rely on "officially" maintained images
and we are not ok to rely on images released by individuals in this case.
But there is a range of images in-between that I have no clarity about.

So my questions are:

1) Is this acceptable to have a non-officially released image as a
dependency in released code for the ASF project?


First question: Is it the *only* way you can run Airflow? Does it end up 
in the source tarball? If so, you need to review the ASF licensing 
requirements and make sure you're not in violation there. (Just Checking!)


Second: Most of these look like *testing* dependencies, not runtime 
dependencies.



2) If it's not - how do we determine which images are "officially
maintained".

3) If yes - how do we put the boundary - when image is acceptable? Are
there any criteria we can use or/ constraints we can put on the
licences/organizations releasing the images we want to make dependencies
for released code of ours?


How hard would it be for the Airflow community to import the Dockerfiles 
and build the images themselves? And keep those imported forks up to 
date? We do this a lot in CouchDB for our dependencies (not just Docker) 
where it's a personal project of someone in the community, or even where 
it's some corporate thing that we want to be sure we don't break on when 
they implement a change for their own reasons.


Automating building thes