On Jan 3, 2014, at 5:25 AM, Simon IJskes - QCG <si...@qcg.nl> wrote:

> In order to gain some time to discuss this first i will vote -1.
> 
> First, we decided to NOT remove velocity builder.

When I read the email chain, my impression was that we wanted to remove it (to 
quote you Sim, “To be honest, I hate it”), but there was a dependency on it in 
the ‘extras’ folder that was added in the trunk branch.  As there is no 
‘extras’ in the 2.2 branch, and that is what this patch applies to, I thought 
it was fair to remove VelocityConfigurationBuilder from the 2.2 branch.   
Perhaps we should revisit the ConfigurationBuilder approach in another thread.  
For now I’ll spin another patch that doesn’t remove 
VelocityConfigurationBuilder.

> 
> Second, no need to remove the jars as specified in your own comments on 
> RIVER-432.
> 
> Pulling in external jars at compile time, shall we start here?
> 
> They are already in the svn. They are already in the build scripts. What does 
> this patch fix? No legal problems?
> 

Apache policy is somewhat unclear on this point.  One needs to examine the 
mailing lists for clues on what we should really do.  I will argue that:

1 - The fundamental distribution model of Apache is source code, not binaries.
2 - Distributing binaries is tolerated but not encouraged.  Since the svn 
repository can be seen as a distribution point, binaries in svn are also 
tolerated but not encouraged.
3 - Downloading dependency binaries at build time is technologically easy, 
provides the same guarantees as putting them in cvs, and avoids the question of 
effectively distributing someone else’s code.

All these together suggest that although we’re technically OK to put dependency 
jars in a “-deps” package (note that the status quo _is_ unacceptable - at the 
very least, we need to restructure the dependencies into a “-deps” binary 
package), there is some policy uncertainty which we avoid totally by having 
dependencies downloaded from a known-good source at build time.

Let’s examine these points:

1 - The fundamental distribution model of Apache is source code, not binaries.  
Apache began with httpd.  Back in those days, “Open Source” software was 
distributed in source form only, simply because it was mostly intended for Unix 
systems (then later Linux).  I recall the first release of Perl coming down as 
a ten-part uunet news message.  Part of this distribution model was practical 
necessity - System differences made it necessary to compile your software on 
the hardware it was going to run on.  Part of it was open-source philosophy.  
Having the source code meant that you could take a look at it and verify that 
it wasn’t hazardous to your operations.  

In any case, the way we use to use open source software was (“./configure; 
make; make install”).  If the software had dependencies, you built them from 
source, for the same reasons.

Now, as time has gone on, we’ve gotten used to having the JVM as a common 
runtime, and jar files as a common binary distribution medium.  But the Apache 
Foundation’s mandate is to produce open source software that is freely usable 
under the Apache License.  That means source code is at the heart of Apache, 
despite the rest of the world’s comfort with binaries.  Hence Roy’s statements 
in (1):

> Class files are not open source.  Jar files filled with class files
> are not open source.  The fact that they are derived from open source
> is applicable only to what we allow projects to be dependent upon,
> not what we vote on as a release package.  Release votes are on verified
> open source artifacts.  Binary packages are separate from source packages.
> One cannot vote to approve a release containing a mix of source and
> binary code because the binary is not open source and cannot be verified
> to be safe for release (even if it was derived from open source).
> 
> I thought that was frigging obvious.  Why do I need to write documentation
> to explain something that is fundamental to the open source definition?
He’s talking about binary packages, not jar files in svn, but I read that (and 
many other emails) as a distaste for binary distributions.

In fact, if you look at Apache httpd’s download page, it doesn’t appear that 
the Apache project publishes any Unix or Linux binaries.  They leave that to 
other organizations.

2 - Distributing binaries is tolerated but not encouraged.  Since the svn 
repository can be seen as a distribution point, binaries in svn are also 
tolerated but not encouraged.

It’s hard to find a single reference that encapsulates this outlook, but that’s 
the impression I get from reading the various mailing lists.  For instance, Sam 
Ruby says (2):
> IMO, our projects release source. So, our projects should not maintain 
> object/binary artifacts
> in their svn release tree, regardless of license (category a or b).
There is some debate on whether the svn tree should be considered a 
distribution point.  Incubator releases are regularly called out for not having 
“NOTICE” and “RELEASE” files at all reasonable checkout points in svn.  
[LEGAL-26] (https://issues.apache.org/jira/browse/LEGAL-26) concerns this and 
remains unresolved.

Doug Cutting (3) says:
> On Mon, Sep 16, 2013 at 2:50 AM, Stephen Connolly
> <stephen.alan.conno...@gmail.com> wrote:
> > * Source control is not an Apache distribution and hence we do not need to
> > have LICENSE and NOTICE files in source control, it can be a nice
> > convenience, but there is no *requirement*.
> 
> I think perhaps you're looking for clear lines where things are
> actually a bit fuzzy.  Certainly releases are official distributions
> and need LICENSE and NOTICE files.  That line is clear.  On the other
> hand, we try to discourage folks from thinking that source control is
> a distribution.  Rather we wish it to be considered our shared
> workspace, containing works in progress, not yet always ready for
> distribution to folks outside the foundation.  But, since we work in
> public, folks from outside the foundation can see our shared workspace
> and might occasionally mistake it for an official distribution.  We'd
> like them to still see a LICENSE and NOTICE file.  So it's not a
> hard-and-fast requirement that every tree that can possibly be checked
> out have a LICENSE and NOTICE file at its root, but it's a good
> practice for those trees that are likely to be checked out have them,
> so that folks who might consume them are well informed.
Again, he’s not talking directly about jar files in svn, however I think his 
statement that “since we work in public, folks from outside the foundation can 
see our shared workspace and might occasionally mistake it for an official 
distribution” applies here.  Fundamentally, the decision on how and where to 
distribute ‘velocity.jar’ rightly belongs with the Velocity group and I don’t 
think we ought to redistribute it.

3 - Downloading dependency binaries at build time is technologically easy, 
provides the same guarantees as putting them in cvs, and avoids the question of 
effectively distributing someone else’s code.

There doesn’t seem to be clear policy in the ASF on this, as evidenced by the 
frequent debates on it, and the lack of documentation.  I’ve tried to lay out 
an argument that having jars in svn is not encouraged by the ASF (really, it’s 
not in line with the ASF’s charter), even if it isn’t disallowed.  You may 
disagree, and I won’t claim I’ve made a strong argument, simply because the 
policy isn’t clear.  So instead of going through arguments that amount to 
differences of opinion on Apache policy, let’s use a technological solution 
that is simple, common, and avoids the question altogether, by automatically 
downloading the dependencies at build time.

Projects that use Maven do this automatic download as standard practice (that’s 
what Maven does, and that’s what the Maven Central infrastructure is there to 
support).  We don’t use Maven, which is fine (our customers have asked us to 
make our binaries available in Maven Central, and we’ve done that).  Apache Ivy 
is a popular add-on to Apache Ant that provides similar dependency resolution 
to an Ant-based build.

I was a little surprised how easy it was to persuade Ivy to get the required 
dependencies at build time.  The “ivy.xml” file is 39 lines including the ASL 
header (which by the way I forgot to include in the patch - I’ll fix that).  
There are about 50 lines added to ‘build.xml’ to download Ivy and then download 
the required jar files

So, given that the status-quo seems to be unacceptable (Roy talks about not 
having jar files in the open-source trees, only in “-deps” and “tools” trees), 
we have two options:

(a) - restructure the svn repository and the build to allow a separate “-deps” 
distribution.  This wouldn’t affect our binary distributions (note that I’m no 
longer using the term “binary release”), but to build from source, a user would 
have to download a separate archive, unpack it, and then copy those files into 
the directory that was unpacked from the source release.  This option 
effectively still has us distributing dependent binaries, which is not the goal 
of the ASF, just with a disclaimer that says “this isn’t an ASF release, its 
just a binary distribution put together by a committer for your convenience, so 
don’t feel you should trust it too much”.

(b) - use Ivy to get the jars from Maven Central automatically as part of the 
build.

I think (b) is the option that causes the least hassle for our downstream 
consumers, and not much hassle for us.


> Pulling external jars at compile time also makes it more difficult to certify 
> the software. In order to certify the software you need to establish baseline 
> that will be garanteed the same, even if you pull it from the archive 10 
> years later.

As I said above, Apache’s focus is creating software that can be built from 
source, not distributing binaries (note that QCG or another company might have 
a different focus, and is perfectly able to distribute binaries under the 
Apache license).  I think a reasonably prudent user would ask “How can I trust 
the ‘velocity.jar’ that’s included in this binary?”  And the answer would be 
either “because I built it from source and installed it in my corporate 
repository” (very cautious, but not unheard-of) or “It was published by the 
Velocity group to a trusted repository, Maven Central” (more common).

If you look in the “ivy.xml” file you’ll see that the dependencies are 
specified using Maven-style “group-artifact-version” coordinates.  Old versions 
are maintained in Maven Central forever.  I suppose it’s possible that a 
publisher could convince Maven Central to remove a version for some reason 
(security or licensing problems perhaps), but then, would we want to be 
distributing that version in a “-deps” package?

I agree that it’s not enough to just say “you need to download such-and-such 
jar”, hence the automatic download managed by “Ivy” from Maven Central.

> It is not a high level project that builds on several frameworks. It is a 
> lowlevel system library. The stuff below the stack is minimal. The number of 
> jars we use is limited. Why bother?
> 

In the currently released branches, the dependencies are limited to ASM and 
Velocity.  Looking forward to the trunk branch and the qa_refactor branch, the 
number of external dependencies seem to be increasing (IMO I don’t like that, 
because I also view River as a low level system library, but I’m only one PMC 
member).  We need to get in front of the problem before we start distributing 
large numbers of dependencies.

This point rolls in with the general question of jar files in version control.  
I was always taught that version control was for source code, and putting 
binaries into version control was a bad idea.  In addition, there are practical 
problems - with older systems like cvs, even doing an update or commit 
effectively downloads the binaries, which slows things down if there are large 
binary files.  On newer distributed version control systems like git or 
Mercurial, the entire repository, including all versions of binary artifacts, 
comes down with the project checkout.  Currently, we have one version of 
relatively few jar files in our repository, so it’s not a major issue.  But it 
gets worse as time goes on.  So I suggest we work out the technology now to 
avoid the problem.

> Gr. Simon
> 

Thanks for the questions, Sim.  I hope you’ll come around to removing your ‘-1’.

Cheers,

Greg

Footnotes
——————

(1) - Roy Fielding - http://s.apache.org/roy-binary-deps-1
(2) - Sam Ruby - http://s.apache.org/r5
(3) - Doug Cutting - http://s.apache.org/GNP

> On 02-01-14 18:22, Greg Trasuk wrote:
>> 
>> Hello all:
>> 
>> Please have a look at the patch mentioned below and cast a vote on it.
>> 
>> The main idea is to remove the dependency jar files from the source 
>> distribution.  As a side effect of using Ivy, it becomes reasonable to 
>> remove them from the svn archive as well.  Also, the Velocity dependency was 
>> there to support the VelocityConfigurationBuilder.  We had discussed 
>> removing that component, so rather than move that dependency to Ivy, I’ve 
>> removed VelocityConfigurationBuilder.
>> 
>> It’s arguable whether the VelocityConfigurationBuider was part of the 
>> official Jini API (I see it as a utility, not API), so I don’t think this 
>> commit actually requires a vote.  However, it does seem like a significant 
>> change to the build process that ought to be reviewed.  So I propose to 
>> treat this as a “lazy consensus” vote, and will commit the change to the 2.2 
>> branch if there are no objections in 72 hours (i.e. 1730UTC 20140105).
>> 
>> At the same time, based on discussions over on gene...@incubator.apache.org, 
>> I’ll withdraw my assertion that we can’t have jars in svn.  Those interested 
>> may want to check out the thread at 
>> http://mail-archives.apache.org/mod_mbox/incubator-general/201312.mbox/%3C01B04CC4-95B8-4A39-BC16-04BAA4269B65%40stratuscom.com%3E
>> 
>> Cheers,
>> 
>> Greg.
>> 
>> On Jan 2, 2014, at 12:05 PM, Greg Trasuk (JIRA) <j...@apache.org> wrote:
>> 
>>> 
>>>     [ 
>>> https://issues.apache.org/jira/browse/RIVER-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>  ]
>>> 
>>> Greg Trasuk updated RIVER-432:
>>> ------------------------------
>>> 
>>>    Attachment: river-2_2_remove_jars.diff
>>> 
>>> The attached patch for the 2.2 branch does the following:
>>> - removes the 'asm' directory and 'tests/lib' directories which currently 
>>> contain the asm library, mockito, and junit jars.
>>> - Modifies 'build.xml', 'common.xml', and adds 'ivy.xml' so that the Apache 
>>> Ivy ant plugin is downloaded at build time, and then used to retrieve the 
>>> libraries mentioned above from Maven Central.  This removes the need to 
>>> have the jar files in svn.
>>> - Removes (as per discussion 
>>> http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3C509B99E3.6080800%40qcg.nl%3E)
>>>  the VelocityConfigBuilder, and associated Velocity jars.  Note that the 
>>> 'extras' folder is not present in the 2.2 branch, so Sim's last comments in 
>>> the thread do not apply.
>>> 
>>>> Jar files in svn and src distributions
>>>> --------------------------------------
>>>> 
>>>>                Key: RIVER-432
>>>>                URL: https://issues.apache.org/jira/browse/RIVER-432
>>>>            Project: River
>>>>         Issue Type: Bug
>>>>           Reporter: Greg Trasuk
>>>>        Attachments: river-2_2_remove_jars.diff
>>>> 
>>>> 
>>>> Recent traffic on the incubator lists has pointed out that including jar 
>>>> files for dependencies in the subversion repository and the source 
>>>> distributions is against Apache policy.
>>>> In River, the following libraries appear in the Subversion repository and 
>>>> the source distributions (these are from trunk, a smaller set appear in 
>>>> the 2.2 branch):
>>>> animal-sniffer
>>>> asm
>>>> bouncy-castle
>>>> dnsjava
>>>> high-scale-lib
>>>> rc-libs
>>>> velocity
>>>> They all have to go.  What are we using them for?  As I understand it, we 
>>>> were going to remove the VelocityConfigurationBuilder, so that's not a 
>>>> problem.  Some of the others are available from Maven Central, so we can 
>>>> get them at build time using Ivy or another build tool.  Which ones are 
>>>> actually required?  And where did they come from?
>>> 
>>> 
>>> 
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.1.5#6160)
>> 
> 
> 
> -- 
> QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
> Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397

Reply via email to