Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-28 Thread Pedro Giffuni

--- On Fri, 10/28/11, Jürgen Schmidt  wrote:



> > 4) I know you want ucpp there too, but since that
> > stuff is used in idlc, I think I'd prefer it in
> > idlc/source/preproc/
> > as it was before. No idea if we can use the system cpp
> > for the rest but that would probably make sense.

> mmh, i would prefer to put it under the ext-sources to make
> clear that it comes from external.
> 

That is pretty well covered by SVN and the NOTICE file,
but I was only brainstorming.

Just have fun :).

Pedro.



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-28 Thread Jürgen Schmidt

On 10/27/11 8:28 PM, Pedro Giffuni wrote:



--- On Thu, 10/27/11, Jürgen Schmidt  wrote:



In any case, yes.. I think this is the way to go. I am
just hoping there will be a way to opt out those
components in favor of the system libraries when those
available.


me too but we should move forward and we can change it at
any time when we have a better solution.



I am OK with that, but let me attempt to dump what I think:

1) you are not bringing in *anything* copyleft, that directory
will only be for the non-restrictive stuff that we need: ICU,
Boost, etc.

2) This will all have to be registered in the NOTICE file,
but since this is transitory and not really stuff we use in
base, we should start a new section there to separate it from
the stuff we do use in the core system.
3) We should probably move some of the stuff in soltools
there too (mkdepend).
4) I know you want ucpp there too, but since that stuff is
used in idlc, I think I'd prefer it in idlc/source/preproc/
as it was before. No idea if we can use the system cpp for the
rest but that would probably make sense.
mmh, i would prefer to put it under the ext-sources to make clear that 
it comes from external.


Juergen



All just IMHO, I am pretty sure whatever you do is better than
what we have now :).

Pedro.




Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Pedro Giffuni
Hi Matthias;

--- On Thu, 10/27/11, Mathias Bauer  wrote:
...
>
> >> > In any case, yes.. I think this is the way to
> go. I am
> >> > just hoping there will be a way to opt out
> those
> > 
> > I am OK with that, but let me attempt to dump what I
> think:
> > 
> > 1) you are not bringing in *anything* copyleft, that
> directory
> > will only be for the non-restrictive stuff that we
> need: ICU,
> > Boost, etc.
> 
> That should be doable. OTOH I'm wondering whether we should
> keep the copyleft tarballs at Apache Extras - it would allow
> to still build with them (something that can be done outside
> the ASF infrastructure and is still appreciated (if I
> understood correctly).
>
I don't like that but we will have to do it as a temporary
solution to avoid breaking the build until we replace
everything.

I think on the long run this is only interesting for windows
binaries, due to the difficulties of getting those packages
from different places. On linux/BSD distributions it makes
sense to use the prepackaged mozilla, etc.
 
> > 3) We should probably move some of the stuff in
> soltools
> > there too (mkdepend).
> 
> That's something for later, ATM we should move the ext_src
> stuff into a secure place.
> 

Yes. Also for later, the simpleICC library is used to generate
a color profile required for pdf. I think we should just
generate the color profile somewhere outside the main build
and use it, avoiding the extra build cycles.

Another thing is we are excluding by default with extreme
prejudice both LGPL and MPL but it will be convenient to
reevaluate that since we will have to use the prepackaged
hunspell.

> If nobody else wants to do it, I can invest some time into
> that, but it might take some days.
> 

I won't do it because of principles... I want them to
just go away ;-).

FWIW, Rob and I are trying to use an ooo- prefix on
Apache Extras. ooo-external-sources ?

> It seems that the consensus is that we check in the binary
> tarballs into trunk/ext_sources?!
> 

I am not sure on that, I think lazy consensus by whomever does
it first will win :).

Pedro.



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Mathias Bauer
Am 27.10.2011 20:28, schrieb Pedro Giffuni:

> --- On Thu, 10/27/11, Jürgen Schmidt  wrote:
> 
>> >
>> > In any case, yes.. I think this is the way to go. I am
>> > just hoping there will be a way to opt out those
>> > components in favor of the system libraries when those
>> > available.
>> 
>> me too but we should move forward and we can change it at
>> any time when we have a better solution.
>> 
> 
> I am OK with that, but let me attempt to dump what I think:
> 
> 1) you are not bringing in *anything* copyleft, that directory
> will only be for the non-restrictive stuff that we need: ICU,
> Boost, etc.

That should be doable. OTOH I'm wondering whether we should keep the
copyleft tarballs at Apache Extras - it would allow to still build with
them (something that can be done outside the ASF infrastructure and is
still appreciated (if I understood correctly).

> 3) We should probably move some of the stuff in soltools
> there too (mkdepend).

That's something for later, ATM we should move the ext_src stuff into a
secure place.

If nobody else wants to do it, I can invest some time into that, but it
might take some days.

It seems that the consensus is that we check in the binary tarballs into
trunk/ext_sources?!

Regards,
Mathias


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Rob Weir
On Thu, Oct 27, 2011 at 2:28 PM, Pedro Giffuni  wrote:
>
>
> --- On Thu, 10/27/11, Jürgen Schmidt  wrote:
>
>> >
>> > In any case, yes.. I think this is the way to go. I am
>> > just hoping there will be a way to opt out those
>> > components in favor of the system libraries when those
>> > available.
>>
>> me too but we should move forward and we can change it at
>> any time when we have a better solution.
>>
>
> I am OK with that, but let me attempt to dump what I think:
>
> 1) you are not bringing in *anything* copyleft, that directory
> will only be for the non-restrictive stuff that we need: ICU,
> Boost, etc.
>

I think it is like the SVN trunk.  We initially bring it all in, and
then remove the copyleft parts.  Of course if we can remove them
before hand, that is good as well.  But whatever order we do the work,
we cannot release until we've done the IP review.

The files are currently hosted here:

http://hg.services.openoffice.org/binaries/

Since the build currently depends on that, I think we want to move
those files now, to Apache, rather than wait too long.

-Rob

> 2) This will all have to be registered in the NOTICE file,
> but since this is transitory and not really stuff we use in
> base, we should start a new section there to separate it from
> the stuff we do use in the core system.
> 3) We should probably move some of the stuff in soltools
> there too (mkdepend).
> 4) I know you want ucpp there too, but since that stuff is
> used in idlc, I think I'd prefer it in idlc/source/preproc/
> as it was before. No idea if we can use the system cpp for the
> rest but that would probably make sense.
>
> All just IMHO, I am pretty sure whatever you do is better than
> what we have now :).
>
> Pedro.
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Pedro Giffuni


--- On Thu, 10/27/11, Jürgen Schmidt  wrote:

> >
> > In any case, yes.. I think this is the way to go. I am
> > just hoping there will be a way to opt out those
> > components in favor of the system libraries when those
> > available.
> 
> me too but we should move forward and we can change it at
> any time when we have a better solution.
> 

I am OK with that, but let me attempt to dump what I think:

1) you are not bringing in *anything* copyleft, that directory
will only be for the non-restrictive stuff that we need: ICU,
Boost, etc.

2) This will all have to be registered in the NOTICE file,
but since this is transitory and not really stuff we use in
base, we should start a new section there to separate it from
the stuff we do use in the core system.
3) We should probably move some of the stuff in soltools
there too (mkdepend).
4) I know you want ucpp there too, but since that stuff is
used in idlc, I think I'd prefer it in idlc/source/preproc/
as it was before. No idea if we can use the system cpp for the
rest but that would probably make sense.

All just IMHO, I am pretty sure whatever you do is better than
what we have now :).

Pedro.


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Jürgen Schmidt

On 10/27/11 6:13 PM, Pedro Giffuni wrote:



--- On Thu, 10/27/11, Jürgen Schmidt  wrote:
...


i think we still haven't finished on this topic but it is
somewhat
important to move forward with our IP clearance and the
whole
development work.

So if nobody has real objections i would like to move
forward with this
proposal but would also like to change the proposed
directory name from
"ext_sources" to "3rdparty".

Keep in mind that we use this directory to keep the current
state
working and with our ongoing work we will remove more and
more stuff
from there.



I was about to bring in support for FreeBSD's fetch command
(somewhat like curl) in fetch-tarballs.sh and it looks like
now you are obsoleting it :-P .

In any case, yes.. I think this is the way to go. I am just
hoping there will be a way to opt out those components in
favor of the system libraries when those available.


me too but we should move forward and we can change it at any time when 
we have a better solution.


Juergen



Pedro.





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Pedro Giffuni


--- On Thu, 10/27/11, Jürgen Schmidt  wrote:
...
> 
> i think we still haven't finished on this topic but it is
> somewhat 
> important to move forward with our IP clearance and the
> whole 
> development work.
> 
> So if nobody has real objections i would like to move
> forward with this 
> proposal but would also like to change the proposed
> directory name from 
> "ext_sources" to "3rdparty".
> 
> Keep in mind that we use this directory to keep the current
> state 
> working and with our ongoing work we will remove more and
> more stuff 
> from there.
> 

I was about to bring in support for FreeBSD's fetch command
(somewhat like curl) in fetch-tarballs.sh and it looks like
now you are obsoleting it :-P .

In any case, yes.. I think this is the way to go. I am just
hoping there will be a way to opt out those components in
favor of the system libraries when those available.

Pedro.



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Rob Weir
2011/10/27 Jürgen Schmidt :
> On 9/22/11 1:19 PM, Jürgen Schmidt wrote:
>
>> ok, we have several arguments for and against but no decision how we
>> want to move forward. Let us take again a look on it
>>
>> 1. we have a working mechanism to get the externals from somewhere,
>> check md5 sum, unpack, patch, build
>> 1.1 "somewhere" is configurable during the configure step, initially the
>> externals are downloaded from http://hg.services.openoffice.org/binaries
>>
>> 2. having the externals in the repository (SVN) won't be a big issue
>> because in case of a checkout always the tip version is downloaded
>> 2.1 the SCM can be used to track the used version of the externals for a
>> specific OO version -> simply checkout the version tag and everything is
>> in place ...
>>
>> 3. in a DSCM it would be a real problem over time because of the
>> increasing space of all versions
>>
>> 4. we need a replacement http://hg.services.openoffice.org/binaries asap
>> (who knows how long the server will be available)
>>
>> 5. many developers probably work with a local clone of the repository
>> using for example git svn or something else -> disadvantage of the
>> increasing space but probably acceptable if a clean local trunk will be
>> kept and updated
>>
>> Proposed way to move forward
>>
>> 1. put the externals under .../trunk/ext_sources
>> .../trunk/ext_sources
>> .../trunk/main
>> .../trunk/extras
>> 2. adapt configure to use this as default, disable the download (maybe
>> reactivate it later if we move to a DSCM)
>> 3. keep the process with checking the md5 sum as it is (for potential
>> later use)
>>
>> Any opinions or suggestions?
>>
>
> i think we still haven't finished on this topic but it is somewhat important
> to move forward with our IP clearance and the whole development work.
>
> So if nobody has real objections i would like to move forward with this
> proposal but would also like to change the proposed directory name from
> "ext_sources" to "3rdparty".
>
> Keep in mind that we use this directory to keep the current state working
> and with our ongoing work we will remove more and more stuff from there.
>

So keep the current approach with tarballs with MD5 hashnames, etc.,
just as before but on Apache servers?

That sounds good to me.

> The adapted bootstrap mechanism will download the libraries from this new
> place.
>
> Juergen
>
>
>
>
>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-27 Thread Jürgen Schmidt

On 9/22/11 1:19 PM, Jürgen Schmidt wrote:


ok, we have several arguments for and against but no decision how we
want to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere,
check md5 sum, unpack, patch, build
1.1 "somewhere" is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue
because in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version -> simply checkout the version tag and everything is
in place ...

3. in a DSCM it would be a real problem over time because of the
increasing space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository
using for example git svn or something else -> disadvantage of the
increasing space but probably acceptable if a clean local trunk will be
kept and updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential
later use)

Any opinions or suggestions?



i think we still haven't finished on this topic but it is somewhat 
important to move forward with our IP clearance and the whole 
development work.


So if nobody has real objections i would like to move forward with this 
proposal but would also like to change the proposed directory name from 
"ext_sources" to "3rdparty".


Keep in mind that we use this directory to keep the current state 
working and with our ongoing work we will remove more and more stuff 
from there.


The adapted bootstrap mechanism will download the libraries from this 
new place.


Juergen







Re: ICC generated profiles are copylefted (was Re: A systematic approach to IP review?)

2011-10-14 Thread Robert Burrell Donkin
On Fri, Oct 14, 2011 at 8:35 PM, Pedro Giffuni  wrote:
> Hi;
>
> When I saw this thread about machine-generate files, I never
> imagined we would be taking about code in OpenOffice.org but
> I found that this file:
> icc/source/create_sRGB_profile/create_sRGB_profile.cpp
>
> indeed generates viral licensed code!
>
> I am proposing an obvious patch but I wanted the issue
> documented so I created bug 118512.

:-)

Robert


ICC generated profiles are copylefted (was Re: A systematic approach to IP review?)

2011-10-14 Thread Pedro Giffuni
Hi;

When I saw this thread about machine-generate files, I never
imagined we would be taking about code in OpenOffice.org but
I found that this file:
icc/source/create_sRGB_profile/create_sRGB_profile.cpp

indeed generates viral licensed code!

I am proposing an obvious patch but I wanted the issue
documented so I created bug 118512.

enjoy ;)

Pedro.

--- On Thu, 9/29/11, Rob Weir  wrote:

> On Thu, Sep 29, 2011 at 1:53 AM,
> Dennis E. Hamilton wrote:
> > Let me recall the bidding a little here.  What I said
> was
> >
> > " It is unlikely that machine-generated files of any
> kind are copyrightable subject matter."
> >
> > You point out that computer-generated files might
> incorporate copyrightable subject matter.  I hadn't
> considered a hybrid case where copyrightable subject matter
> would subsist in such a work, and I have no idea how and to
> what extend the output qualifies as a work of authorship,
> but it is certainly a case to be reckoned with.
> >
> > Then there is the issue of macro expansion, template
> parameter substitution, etc., and the cases becomes blurrier
> and blurrier.  For example, if I wrote a program and then
> put it through the C Language pre-processor, in how much of
> the expanded result does the copyright declared on the
> original subsist?  (I am willing to concede, for purposes
> of argument, that the second is a derivative work of the
> former, even though the derivation occurred dynamically.)
> >
> > I fancy this example because it is commonplace that
> the pre-processor incorporated files that have their own
> copyright and license notices too.  Also, the original
> might include macro calls, with
> > parameters using macros defined in one or more of
> those incorporated files.
> >
> 
> Under US law:  "Copyright protection subsists, in
> accordance with this
> title, in original works of authorship fixed in any
> tangible medium of
> expression, now known or later developed, from which they
> can be
> perceived, reproduced, or otherwise communicated, either
> directly or
> with the aid of a machine or device"
> 
> IANAL, but I believe Dennis is correct that a machine
> cannot be an
> author, in terms of copyright.  But the author of that
> program might.
> It comes down to who exactly put the work into a "fixed in
> any
> tangible medium of expression".
> 
> When I used a n ordinary code editor, the machine acts as a
> tool that
> I use to create an original work. It is a tool, like a
> paintbrush.  In
> other cases, a tool can be used to transform a work.
> 
> If there is an original work in fixed form that I
> transform, then I
> may have copyright interest in the transformed work. That
> is how
> copyright law protects software binaries as well as source
> code.
> 
> As for the GNU Bison example, if I created the BNF, then I
> have
> copyright interest in the generated code.  That does
> not mean that I
> have exclusive ownership of all the generated code. 
> It might be a
> mashup of original template code from the Bison authors,
> along with
> code that is a transformation of my original grammar
> definition.  It
> isn't an either/or situation.  A work can have mixed
> authorship.
> 
> -Rob
> 
> 
> > I concede that copyrightable matter can survive into a
> machine-generated file.  And I maintain that there can be
> other conditions on the use of such a file other than by
> virtue of it containing portions in which copyright
> subsists.  For example, I don't think the Copyright office
> is going to accept registration of compiled binaries any
> time soon, even though there may be conditions on the
> license of the source code that carries over onto those
> binaries.
> >
> > And, yes, it is murky all the way down.
> >
> >  - Dennis
> >
> > -Original Message-
> > From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
> > Sent: Wednesday, September 28, 2011 22:32
> > To: 'ooo-dev@incubator.apache.org'
> > Subject: RE: A systematic approach to IP review?
> >
> > Not to put too fine a point on this, but it sounds
> like you are talking about boilerplate (and authored)
> template code that Bison incorporates in its output.  It is
> also tricky because the Bison output is computer source
> code.  That is an interesting case.
> >
> > In the US, original work of authorship is pretty
> specific in the case of literary works, which is where
> software copyright falls the last time I checked (too long
> ago, though).  I suspect that a license (in the contractual
> sense) can deal with more than copyrigh

Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-14 Thread Andrew Rist



On 10/14/2011 8:58 AM, Pedro Giffuni wrote:

--- On Fri, 10/14/11, Robert Burrell Donkin wrote:
...

A branch would save us from having say... 1000 commits
with header changes in the history.

Apache uses version control as the canonical record. It's
therefore essential to know why a header was changed and
by whom.


And of course the branch would be on SVN so the history for
the legal changes wouldn't be lost. Of course I meant this
only for the SGA, but ultimately it depends on the people
applying in and from what I understand now, *I* won't be
touching any headers :).

thanks for all these explanations,

Pedro.


Robert & Pedro,

I intend to get started on the headers in the very near future.
My intention is to do a series of checkins by project/directory in the 
source tree, matching the changes to the grant(s).
I have a bit of sequencing of activities before I start, but this is 
next up on the list.


Andrew

--


Oracle Email Signature Logo
Andrew Rist | Interoperability Architect
Oracle Corporate Architecture Group
Redwood Shores, CA | 650.506.9847


Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-14 Thread Pedro Giffuni

--- On Fri, 10/14/11, Robert Burrell Donkin wrote:
...
> 
> > A branch would save us from having say... 1000 commits
> > with header changes in the history.
> 
> Apache uses version control as the canonical record. It's
> therefore essential to know why a header was changed and
> by whom.
>

And of course the branch would be on SVN so the history for
the legal changes wouldn't be lost. Of course I meant this
only for the SGA, but ultimately it depends on the people
applying in and from what I understand now, *I* won't be
touching any headers :).

thanks for all these explanations,

Pedro.


Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-14 Thread Robert Burrell Donkin
On Thu, Oct 13, 2011 at 9:29 PM, Pedro Giffuni  wrote:
>
>
> --- On Thu, 10/13/11, Robert Burrell Donkin wrote:
>
>> I recommend separating review from (automated) execution.
>> If this is done, a branch shouldn't be necessary...
>>
>
> Uhm.. can you elaborate a bit more?

For projects of this scale, some level of automated help is typically needed.

> A branch would save us from having say... 1000 commits with
> header changes in the history.

Apache uses version control as the canonical record. It's therefore
essential to know why a header was changed and by whom.

Robert


Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-13 Thread Pedro Giffuni


--- On Thu, 10/13/11, Robert Burrell Donkin wrote:

> I recommend separating review from (automated) execution.
> If this is done, a branch shouldn't be necessary...
> 

Uhm.. can you elaborate a bit more?

A branch would save us from having say... 1000 commits with
header changes in the history.

regards,

Pedro.
 


Re: How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-13 Thread Robert Burrell Donkin
On Sun, Oct 9, 2011 at 7:42 PM, Pedro Giffuni  wrote:
> Hi;
>
> Looking at how big, and mostly cosmetic but necessary, a
> change it will be to bring in all the SGA license changes,
> and given that it requires manual intervention and is not
> something that can be done in one huge mega commit ...
>
> I think we should create a branch for this changes in merge
> them in two steps: corresponding to both SGAs. This way
> merging CWSs and bugzilla patches can go on without pain and
> people can get started on the header changes.

I recommend separating review from (automated) execution. If this is
done, a branch shouldn't be necessary...

Robert


How about a new branch for the legal changes? (was Re: A systematic approach to IP review?)

2011-10-09 Thread Pedro Giffuni
Hi;

Looking at how big, and mostly cosmetic but necessary, a
change it will be to bring in all the SGA license changes,
and given that it requires manual intervention and is not
something that can be done in one huge mega commit ...

I think we should create a branch for this changes in merge
them in two steps: corresponding to both SGAs. This way
merging CWSs and bugzilla patches can go on without pain and
people can get started on the header changes.

cheers,

Pedro.


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-10-01 Thread Mathias Bauer
Am 01.10.2011 00:17, schrieb Michael Stahl:

> On 30.09.2011 21:24, Mathias Bauer wrote:
>> On 28.09.2011 17:32, Pedro F. Giffuni wrote:
> 
>> Another advantage of unpacking the tarballs: the patches will become
>> *real* patches that just contain changes of the original source code.
>> Often the patches nowadays contain additional files that we just need to
>> build the stuff in OOo (e.g. dmake makefiles) - they could be checked in
>> as regular files.
>> 
>> Currently keeping them as regular files is awkward because then they
>> need to be copied to the place the tarballs are unpacked to.
> 
> but this is just because dmake can only build source files in the same
> directory; imagine a more flexible gbuild external build target where the
> makefiles are in the source tree while the tarball gets unpacked in the
> workdir...

Sure, but until we aren't there...

I didn't talk about the dmake makefiles that are used to unpack and
patch, I was talking about using dmake for building the external modules
that come with their own build system. The makefile.mk in the root
directory of the external modules are not part of the patch, but some
patches contain makefile.mk files that are necessary to build the stuff,
either on all or only on some platforms.

Regards,
Mathias


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-30 Thread Pedro Giffuni


--- On Fri, 9/30/11, Mathias Bauer  wrote:

> 
> I'm not against unpacking the tarballs and applying the
> patches, but we should keep the patches somewhere so that
> updates could be done with the same effort as today.
> 
This could fly.

I like having the patches around. I would only request
that GNU patch is not a requirement to build the OO.

Just for reference, FreeBSD's base has contrib area
(part of the release)
http://svnweb.freebsd.org/base/head/contrib/
and a vendor area which is an intermediate step:
(not part of the release)
http://svnweb.freebsd.org/base/vendor/


> Another advantage of unpacking the tarballs: the patches
> will become *real* patches that just contain changes of
> the original source code.

I definitely like that, yes.

Another thing is: we have to teach GNU configure to skip
building stuff when an installed binary package of the
same thing is available.

Pedro.


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-30 Thread Michael Stahl
On 30.09.2011 21:24, Mathias Bauer wrote:
> On 28.09.2011 17:32, Pedro F. Giffuni wrote:

> Another advantage of unpacking the tarballs: the patches will become
> *real* patches that just contain changes of the original source code.
> Often the patches nowadays contain additional files that we just need to
> build the stuff in OOo (e.g. dmake makefiles) - they could be checked in
> as regular files.
> 
> Currently keeping them as regular files is awkward because then they
> need to be copied to the place the tarballs are unpacked to.

but this is just because dmake can only build source files in the same
directory; imagine a more flexible gbuild external build target where the
makefiles are in the source tree while the tarball gets unpacked in the
workdir...

> Regards,
> Mathias
> 




Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-30 Thread Mathias Bauer
On 28.09.2011 17:32, Pedro F. Giffuni wrote:
> FWIW;
>
> I don't like the patches because I can't really examine well
> the code, besides this is something the VCS handles acceptably:
> commit the original sourcecode and then apply the patches in a
> different commit. If we start with up to date versions there
> would not be much trouble.

I'm not against unpacking the tarballs and applying the patches, but we
should keep the patches somewhere so that updates could be done with the
same effort as today.

Another advantage of unpacking the tarballs: the patches will become
*real* patches that just contain changes of the original source code.
Often the patches nowadays contain additional files that we just need to
build the stuff in OOo (e.g. dmake makefiles) - they could be checked in
as regular files.

Currently keeping them as regular files is awkward because then they
need to be copied to the place the tarballs are unpacked to.

Regards,
Mathias


Re: A systematic approach to IP review?

2011-09-29 Thread Rob Weir
On Thu, Sep 29, 2011 at 1:53 AM, Dennis E. Hamilton
 wrote:
> Let me recall the bidding a little here.  What I said was
>
> " It is unlikely that machine-generated files of any kind are copyrightable 
> subject matter."
>
> You point out that computer-generated files might incorporate copyrightable 
> subject matter.  I hadn't considered a hybrid case where copyrightable 
> subject matter would subsist in such a work, and I have no idea how and to 
> what extend the output qualifies as a work of authorship, but it is certainly 
> a case to be reckoned with.
>
> Then there is the issue of macro expansion, template parameter substitution, 
> etc., and the cases becomes blurrier and blurrier.  For example, if I wrote a 
> program and then put it through the C Language pre-processor, in how much of 
> the expanded result does the copyright declared on the original subsist?  (I 
> am willing to concede, for purposes of argument, that the second is a 
> derivative work of the former, even though the derivation occurred 
> dynamically.)
>
> I fancy this example because it is commonplace that the pre-processor 
> incorporated files that have their own copyright and license notices too.  
> Also, the original might include macro calls, with
> parameters using macros defined in one or more of those incorporated files.
>

Under US law:  "Copyright protection subsists, in accordance with this
title, in original works of authorship fixed in any tangible medium of
expression, now known or later developed, from which they can be
perceived, reproduced, or otherwise communicated, either directly or
with the aid of a machine or device"

IANAL, but I believe Dennis is correct that a machine cannot be an
author, in terms of copyright.  But the author of that program might.
It comes down to who exactly put the work into a "fixed in any
tangible medium of expression".

When I used a n ordinary code editor, the machine acts as a tool that
I use to create an original work. It is a tool, like a paintbrush.  In
other cases, a tool can be used to transform a work.

If there is an original work in fixed form that I transform, then I
may have copyright interest in the transformed work. That is how
copyright law protects software binaries as well as source code.

As for the GNU Bison example, if I created the BNF, then I have
copyright interest in the generated code.  That does not mean that I
have exclusive ownership of all the generated code.  It might be a
mashup of original template code from the Bison authors, along with
code that is a transformation of my original grammar definition.  It
isn't an either/or situation.  A work can have mixed authorship.

-Rob


> I concede that copyrightable matter can survive into a machine-generated 
> file.  And I maintain that there can be other conditions on the use of such a 
> file other than by virtue of it containing portions in which copyright 
> subsists.  For example, I don't think the Copyright office is going to accept 
> registration of compiled binaries any time soon, even though there may be 
> conditions on the license of the source code that carries over onto those 
> binaries.
>
> And, yes, it is murky all the way down.
>
>  - Dennis
>
> -Original Message-----
> From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
> Sent: Wednesday, September 28, 2011 22:32
> To: 'ooo-dev@incubator.apache.org'
> Subject: RE: A systematic approach to IP review?
>
> Not to put too fine a point on this, but it sounds like you are talking about 
> boilerplate (and authored) template code that Bison incorporates in its 
> output.  It is also tricky because the Bison output is computer source code.  
> That is an interesting case.
>
> In the US, original work of authorship is pretty specific in the case of 
> literary works, which is where software copyright falls the last time I 
> checked (too long ago, though).  I suspect that a license (in the contractual 
> sense) can deal with more than copyright.  And, if Bison spits out copyright 
> notices, they still only apply to that part of the output, if any, that 
> qualifies as copyrightable subject matter.
>
> Has the Bison claim ever been tested in court?  Has anyone been pursued or 
> challenged for infringement? I'm just curious.
>
>  - Dennis
>
> -Original Message-
> From: Norbert Thiebaud [mailto:nthieb...@gmail.com]
> Sent: Wednesday, September 28, 2011 22:11
> To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org
> Subject: Re: A systematic approach to IP review?
>
> On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
>  wrote:
>> I'll stand by my original statement.
>>
>> I'm not going to get into the Pixar case since it doesn't ap

RE: A systematic approach to IP review?

2011-09-28 Thread Dennis E. Hamilton
Let me recall the bidding a little here.  What I said was

" It is unlikely that machine-generated files of any kind are copyrightable 
subject matter."

You point out that computer-generated files might incorporate copyrightable 
subject matter.  I hadn't considered a hybrid case where copyrightable subject 
matter would subsist in such a work, and I have no idea how and to what extend 
the output qualifies as a work of authorship, but it is certainly a case to be 
reckoned with.

Then there is the issue of macro expansion, template parameter substitution, 
etc., and the cases becomes blurrier and blurrier.  For example, if I wrote a 
program and then put it through the C Language pre-processor, in how much of 
the expanded result does the copyright declared on the original subsist?  (I am 
willing to concede, for purposes of argument, that the second is a derivative 
work of the former, even though the derivation occurred dynamically.) 

I fancy this example because it is commonplace that the pre-processor 
incorporated files that have their own copyright and license notices too.  
Also, the original might include macro calls, with
parameters using macros defined in one or more of those incorporated files.

I concede that copyrightable matter can survive into a machine-generated file.  
And I maintain that there can be other conditions on the use of such a file 
other than by virtue of it containing portions in which copyright subsists.  
For example, I don't think the Copyright office is going to accept registration 
of compiled binaries any time soon, even though there may be conditions on the 
license of the source code that carries over onto those binaries.

And, yes, it is murky all the way down.

 - Dennis

-Original Message-
From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] 
Sent: Wednesday, September 28, 2011 22:32
To: 'ooo-dev@incubator.apache.org'
Subject: RE: A systematic approach to IP review?

Not to put too fine a point on this, but it sounds like you are talking about 
boilerplate (and authored) template code that Bison incorporates in its output. 
 It is also tricky because the Bison output is computer source code.  That is 
an interesting case.  

In the US, original work of authorship is pretty specific in the case of 
literary works, which is where software copyright falls the last time I checked 
(too long ago, though).  I suspect that a license (in the contractual sense) 
can deal with more than copyright.  And, if Bison spits out copyright notices, 
they still only apply to that part of the output, if any, that qualifies as 
copyrightable subject matter.  

Has the Bison claim ever been tested in court?  Has anyone been pursued or 
challenged for infringement? I'm just curious.  

 - Dennis

-Original Message-
From: Norbert Thiebaud [mailto:nthieb...@gmail.com] 
Sent: Wednesday, September 28, 2011 22:11
To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org
Subject: Re: A systematic approach to IP review?

On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
 wrote:
> I'll stand by my original statement.
>
> I'm not going to get into the Pixar case since it doesn't apply here.

I did not say it applied to the Visual studio generated cruft... I
merely commented on the blanket assertion that 'computer generated =>
no copyright'
>
> The Bison manual may have license conditions on what can be done with the 
> generated artifact, but I suggest that is not about copyrightable subject 
> matter in the artifact.
Actually it is. The only claim they could legally have _is_ on the
generated bit that are substantial piece of code copied from template
they provide, namely in the case of a bison generated parser the whole
parser skeleton needed to exploit the generated state-graph. the whole
paragraph is about the copyright disposition of these bits. and in the
case of bison they explicitly grant you a license to use these bits in
the 'normal' use case... my point being that the existence of that
paragraph also disprove the assertion that 'computer  generated => no
copyright'

You could write a program that print itself... the mere fact that it
print itself does not mean you lose the copyright on your program...

That being said, I do think you are on the clear with the Visual
Studio generated cruft... but not merely because there is 'computer
generation' involved.


Norbert



RE: A systematic approach to IP review?

2011-09-28 Thread Dennis E. Hamilton
Not to put too fine a point on this, but it sounds like you are talking about 
boilerplate (and authored) template code that Bison incorporates in its output. 
 It is also tricky because the Bison output is computer source code.  That is 
an interesting case.  

In the US, original work of authorship is pretty specific in the case of 
literary works, which is where software copyright falls the last time I checked 
(too long ago, though).  I suspect that a license (in the contractual sense) 
can deal with more than copyright.  And, if Bison spits out copyright notices, 
they still only apply to that part of the output, if any, that qualifies as 
copyrightable subject matter.  

Has the Bison claim ever been tested in court?  Has anyone been pursued or 
challenged for infringement? I'm just curious.  

 - Dennis

-Original Message-
From: Norbert Thiebaud [mailto:nthieb...@gmail.com] 
Sent: Wednesday, September 28, 2011 22:11
To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org
Subject: Re: A systematic approach to IP review?

On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
 wrote:
> I'll stand by my original statement.
>
> I'm not going to get into the Pixar case since it doesn't apply here.

I did not say it applied to the Visual studio generated cruft... I
merely commented on the blanket assertion that 'computer generated =>
no copyright'
>
> The Bison manual may have license conditions on what can be done with the 
> generated artifact, but I suggest that is not about copyrightable subject 
> matter in the artifact.
Actually it is. The only claim they could legally have _is_ on the
generated bit that are substantial piece of code copied from template
they provide, namely in the case of a bison generated parser the whole
parser skeleton needed to exploit the generated state-graph. the whole
paragraph is about the copyright disposition of these bits. and in the
case of bison they explicitly grant you a license to use these bits in
the 'normal' use case... my point being that the existence of that
paragraph also disprove the assertion that 'computer  generated => no
copyright'

You could write a program that print itself... the mere fact that it
print itself does not mean you lose the copyright on your program...

That being said, I do think you are on the clear with the Visual
Studio generated cruft... but not merely because there is 'computer
generation' involved.


Norbert



Re: A systematic approach to IP review?

2011-09-28 Thread Norbert Thiebaud
On Wed, Sep 28, 2011 at 7:55 PM, Dennis E. Hamilton
 wrote:
> I'll stand by my original statement.
>
> I'm not going to get into the Pixar case since it doesn't apply here.

I did not say it applied to the Visual studio generated cruft... I
merely commented on the blanket assertion that 'computer generated =>
no copyright'
>
> The Bison manual may have license conditions on what can be done with the 
> generated artifact, but I suggest that is not about copyrightable subject 
> matter in the artifact.
Actually it is. The only claim they could legally have _is_ on the
generated bit that are substantial piece of code copied from template
they provide, namely in the case of a bison generated parser the whole
parser skeleton needed to exploit the generated state-graph. the whole
paragraph is about the copyright disposition of these bits. and in the
case of bison they explicitly grant you a license to use these bits in
the 'normal' use case... my point being that the existence of that
paragraph also disprove the assertion that 'computer  generated => no
copyright'

You could write a program that print itself... the mere fact that it
print itself does not mean you lose the copyright on your program...

That being said, I do think you are on the clear with the Visual
Studio generated cruft... but not merely because there is 'computer
generation' involved.


Norbert


Re: A systematic approach to IP review?

2011-09-28 Thread Pedro F. Giffuni

--- On Wed, 9/28/11, Norbert Thiebaud wrote:
...
> On Wed, Sep 28, 2011 at 5:42 PM,
> Dennis E. Hamilton wrote:
> > It is unlikely that machine-generated files of any
> kind are copyrightable subject matter.
> 
> I'd imagine that Pixar, for instance, would have a problem
> with that
> blanket statement...
> 
> The very existence of this paragraph in the Bison manual :
> http://www.gnu.org/s/bison/manual/bison.html#Conditions
> also raise doubt as the the validity of the premise.
> 

Ugh... I am not a lawyer and I normally prefer not to be have
to read all that but OOo requires bison to build, so if that
paragraph still applies we should be using yacc instead.

Pedro.



RE: A systematic approach to IP review?

2011-09-28 Thread Dennis E. Hamilton
I'll stand by my original statement.

I'm not going to get into the Pixar case since it doesn't apply here.

The Bison manual may have license conditions on what can be done with the 
generated artifact, but I suggest that is not about copyrightable subject 
matter in the artifact.  A similar condition would be one in, let's say for a 
hypothetical case, Visual C++ 2008 Express Edition requiring that generated 
code be run on Windows.  It's not about copyright.  

And I agree, one must understand license conditions that apply to the tool used 
to make the generated artifacts.  I did neglect to consider that.

 - Dennis

-Original Message-
From: Norbert Thiebaud [mailto:nthieb...@gmail.com] 
Sent: Wednesday, September 28, 2011 16:41
To: ooo-dev@incubator.apache.org; dennis.hamil...@acm.org
Subject: Re: A systematic approach to IP review?

On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton
 wrote:
> It is unlikely that machine-generated files of any kind are copyrightable 
> subject matter.

I'd imagine that Pixar, for instance, would have a problem with that
blanket statement...

The very existence of this paragraph in the Bison manual :
http://www.gnu.org/s/bison/manual/bison.html#Conditions
also raise doubt as the the validity of the premise.

Norbert



Re: A systematic approach to IP review?

2011-09-28 Thread Norbert Thiebaud
On Wed, Sep 28, 2011 at 5:42 PM, Dennis E. Hamilton
 wrote:
> It is unlikely that machine-generated files of any kind are copyrightable 
> subject matter.

I'd imagine that Pixar, for instance, would have a problem with that
blanket statement...

The very existence of this paragraph in the Bison manual :
http://www.gnu.org/s/bison/manual/bison.html#Conditions
also raise doubt as the the validity of the premise.

Norbert


Re: A systematic approach to IP review?

2011-09-28 Thread Rob Weir
On Wed, Sep 28, 2011 at 6:42 PM, Dennis E. Hamilton
 wrote:
> It is unlikely that machine-generated files of any kind are copyrightable 
> subject matter.  I would think that files generated by Visual Studio should 
> just be regenerated, especially if this has to do with preprocessor 
> pre-compilation, project boiler-plate (and even build/make) files, 
> MIDL-compiled files, resource-compiler output, and the like.
>

That is my understanding as well, wrt computer-generated files.
However the lack of copyright does not mean lack of concern.  For
example, some code generation applications have a license that puts
additional restrictions on the generated code.  Some versions of GNU
Bison, the YACC variant, did that.


> (I assume there are no MFC dependencies unless MFC has somehow shown up under 
> VC++ 2008 Express Edition or the corresponding SDK -- I am behind the times.  
> I thought the big issue was ATL.)
>
> Meanwhile, I favor what you say about having a file at the folder level of 
> the buildable components.  It strikes me as a visible way to ensure that the 
> IP review has been completed and is current.  It also has great transparency 
> and accountability since the document is in the SVN itself.  It also survives 
> being extracted from the SVN, included in a tar-ball, etc.  In short: nice!
>
>  - Dennis
>
> -Original Message-
> From: Mathias Bauer [mailto:mathias_ba...@gmx.net]
> Sent: Wednesday, September 28, 2011 04:25
> To: ooo-dev@incubator.apache.org
> Subject: Re: A systematic approach to IP review?
>
> On 19.09.2011 02:27, Rob Weir wrote:
>
>> 1) We need to get all files needed for the build into SVN.  Right now
>> there are some that are copied down from the OpenOffice.org website
>> during the build's bootstrap process.   Until we get the files all in
>> one place it is hard to get a comprehensive view of our dependencies.
>
> If you want svn to be the place for the IP review, we have to do it in
> two steps. There are some cws for post-3.4 that bring in new files.
> Setting up a branch now to bring them to svn will create additional work
> now that IMHO should better be done later.
>
>>
>> 2) Continue the CWS integrations.  Along with 1) this ensures that all
>> the code we need for the release is in SVN.
>
> see above
>
>> e) (Hypothetically) files that are not under an OSS license at all.
>> E.g., a Microsoft header file.  These must be removed.
>
> I assume that you are talking about header files with a MS copyright,
> not header files generated from e.g. Visual Studio. In my understanding
> these files should be considered as contributed under the rules of the
> OOo project and so now their copyright owner is Oracle.
>
>> 5) We should to track the resolution of each file, and do this
>> publicly.  The audit trail is important.  Some ways we could do this
>> might be:
>>
>> a) Track this in SVN properties.
> IMHO this is the best solution. svn is the place of truth if it comes
> down to files.
>
> The second best solution would be to have one text file per build unit
> (that would be a gbuild makefile in the new build system) or per module
> (that would be a sub folder of the sub-repos). The file should be
> checked in in svn.
>
> Everything else (spreadsheets or whatsoever) could be generated from
> that, in case anyone had a need for a spreadsheet with >6 rows
> containing license information. ;-)
>
> Regards,
> Mathias
>
>


RE: A systematic approach to IP review?

2011-09-28 Thread Dennis E. Hamilton
It is unlikely that machine-generated files of any kind are copyrightable 
subject matter.  I would think that files generated by Visual Studio should 
just be regenerated, especially if this has to do with preprocessor 
pre-compilation, project boiler-plate (and even build/make) files, 
MIDL-compiled files, resource-compiler output, and the like.  

(I assume there are no MFC dependencies unless MFC has somehow shown up under 
VC++ 2008 Express Edition or the corresponding SDK -- I am behind the times.  I 
thought the big issue was ATL.)

Meanwhile, I favor what you say about having a file at the folder level of the 
buildable components.  It strikes me as a visible way to ensure that the IP 
review has been completed and is current.  It also has great transparency and 
accountability since the document is in the SVN itself.  It also survives being 
extracted from the SVN, included in a tar-ball, etc.  In short: nice!

 - Dennis

-Original Message-
From: Mathias Bauer [mailto:mathias_ba...@gmx.net] 
Sent: Wednesday, September 28, 2011 04:25
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On 19.09.2011 02:27, Rob Weir wrote:

> 1) We need to get all files needed for the build into SVN.  Right now
> there are some that are copied down from the OpenOffice.org website
> during the build's bootstrap process.   Until we get the files all in
> one place it is hard to get a comprehensive view of our dependencies.

If you want svn to be the place for the IP review, we have to do it in 
two steps. There are some cws for post-3.4 that bring in new files. 
Setting up a branch now to bring them to svn will create additional work 
now that IMHO should better be done later.

>
> 2) Continue the CWS integrations.  Along with 1) this ensures that all
> the code we need for the release is in SVN.

see above

> e) (Hypothetically) files that are not under an OSS license at all.
> E.g., a Microsoft header file.  These must be removed.

I assume that you are talking about header files with a MS copyright, 
not header files generated from e.g. Visual Studio. In my understanding 
these files should be considered as contributed under the rules of the 
OOo project and so now their copyright owner is Oracle.

> 5) We should to track the resolution of each file, and do this
> publicly.  The audit trail is important.  Some ways we could do this
> might be:
>
> a) Track this in SVN properties.
IMHO this is the best solution. svn is the place of truth if it comes 
down to files.

The second best solution would be to have one text file per build unit 
(that would be a gbuild makefile in the new build system) or per module 
(that would be a sub folder of the sub-repos). The file should be 
checked in in svn.

Everything else (spreadsheets or whatsoever) could be generated from 
that, in case anyone had a need for a spreadsheet with >6 rows 
containing license information. ;-)

Regards,
Mathias



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Michael Stahl
On 28.09.2011 17:32, Pedro F. Giffuni wrote:
> FWIW;
> 
> I don't like the patches because I can't really examine well
> the code, besides this is something the VCS handles acceptably:
> commit the original sourcecode and then apply the patches in a
> different commit. If we start with up to date versions there
> would not be much trouble.

if we didn't have many thousands of lines of patches to rebase, then
upgrading to less outdated versions wouldn't be such a PITA.

sadly in many cases upstreaming patches was never sufficiently high on the
priority list to actually get done...

-- 
"Dealing with failure is easy: Work hard to improve.
 Success is also easy to handle: You've solved the wrong problem.
 Work hard to improve." -- Alan Perlis



RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Pedro F. Giffuni
The idea (not originally mine) is to have keep only compatible
licensed code under an isolated (3rdparty) directory.

I think on the long run we should try to use the system versions
of such software when available, and every linux/bsd distribution
is probably doing that for LO already.

Pedro.

--- On Wed, 9/28/11, Dennis E. Hamilton  wrote:

> The problem with bringing the 3rd
> party software completely into the SVN tree and modifying it
> in the tree has to do with the license the updated software
> is under.  In that case, there *is* a code provenance
> issue and I believe it crosses a line that the Apache
> Software Foundation is unwilling to cross with regard to the
> integrity of its code bases.
> 
> The current patches to Boost, for example, do not change
> the license on the code and preserve the Boost
> license.  But since this is ephemeral and the source is
> never in the SVN tree (is that correct?) the derivative use
> disappears at the end of a build.  It is sufficient
> then to include the dependency in the NOTICE for the release
> and not worry further.
> 
> Also, the current dependency is several releases behind the
> current Boost release.  This might not matter - the
> specific Boost libraries that are used might not be
> effected.  But there is a release synchronization
> issue.  A fork would have to be maintained.  Also,
> the dependencies are managed better now, rather than having
> the entire Boost library installed for cherry picking.
> 
> (This will all change at some point, since Boost is being
> incorporated into ISO C++.  It is probably best to wait
> for that to ripple out into the compiler distributions.)
> 
>  - Dennis
> 
> -Original Message-
> From: Pedro F. Giffuni [mailto:giffu...@tutopia.com]
> 
> Sent: Wednesday, September 28, 2011 08:32
> To: ooo-dev@incubator.apache.org
> Subject: Re: handling of ext_sources - Juergen's suggestion
> [was: Re: A systematic approach to IP review?]
> 
> FWIW;
> 
> I don't like the patches because I can't really examine
> well
> the code, besides this is something the VCS handles
> acceptably:
> commit the original sourcecode and then apply the patches
> in a
> different commit. If we start with up to date versions
> there
> would not be much trouble.
> 
> just my $0.02, not an objection.
> 
> Pedro.
> 
> --- On Wed, 9/28/11, Jürgen Schmidt 
> wrote:
> 
> ...
> 
> > > I wouldn't give up the patches, as they allow to
> > handle updates better.
> > > This would cause a problem, as direct changes to
> the
> > 3rd party stuff without
> > > additional authorization (means: changing the
> source
> > code must not happen
> > > accidently, only when the 3rd party code gets an
> > update from upstream) must
> > > be prevented, while still patch files must be
> allowed
> > to added, removed, or
> > > changed, not the original source code. If that
> wasn't
> > possible or too
> > > cumbersome, checking in the tarballs in
> "3rdparty"
> > would be better.
> > >
> > 
> > i also wouldn't give up the patches and for that
> reason i
> > would like to move
> > forward for now with keeping the tarballs as proposed.
> But
> > i like the name
> > "3rdparty" for the directory and we can later on
> change it
> > from the tarballs
> > to the unpacked code it we see demand for it. At the
> moment
> > it's just easier
> > to keep the tarballs and focus on other work.
> > 
> > 
> > >
> > > As svn users never download the complete history
> as
> > DSCM users do, the pain
> > > of binary files in the repo isn't that hard. In
> case
> > AOOo moved to a DSCM
> > > again later, the tarballs could be moved out
> again
> > easily.
> > >
> > 
> > agree, we don't really loose anything, can change if
> > necessary and can
> > continue with our work
> > 
> > Juergen
> > 
> 
> 
>


Re: A systematic approach to IP review?

2011-09-28 Thread Mathias Bauer

On 19.09.2011 02:27, Rob Weir wrote:


1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.


If you want svn to be the place for the IP review, we have to do it in 
two steps. There are some cws for post-3.4 that bring in new files. 
Setting up a branch now to bring them to svn will create additional work 
now that IMHO should better be done later.




2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.


see above


e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.


I assume that you are talking about header files with a MS copyright, 
not header files generated from e.g. Visual Studio. In my understanding 
these files should be considered as contributed under the rules of the 
OOo project and so now their copyright owner is Oracle.



5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.
IMHO this is the best solution. svn is the place of truth if it comes 
down to files.


The second best solution would be to have one text file per build unit 
(that would be a gbuild makefile in the new build system) or per module 
(that would be a sub folder of the sub-repos). The file should be 
checked in in svn.


Everything else (spreadsheets or whatsoever) could be generated from 
that, in case anyone had a need for a spreadsheet with >6 rows 
containing license information. ;-)


Regards,
Mathias


RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Dennis E. Hamilton
The problem with bringing the 3rd party software completely into the SVN tree 
and modifying it in the tree has to do with the license the updated software is 
under.  In that case, there *is* a code provenance issue and I believe it 
crosses a line that the Apache Software Foundation is unwilling to cross with 
regard to the integrity of its code bases.

The current patches to Boost, for example, do not change the license on the 
code and preserve the Boost license.  But since this is ephemeral and the 
source is never in the SVN tree (is that correct?) the derivative use 
disappears at the end of a build.  It is sufficient then to include the 
dependency in the NOTICE for the release and not worry further.

Also, the current dependency is several releases behind the current Boost 
release.  This might not matter - the specific Boost libraries that are used 
might not be effected.  But there is a release synchronization issue.  A fork 
would have to be maintained.  Also, the dependencies are managed better now, 
rather than having the entire Boost library installed for cherry picking.

(This will all change at some point, since Boost is being incorporated into ISO 
C++.  It is probably best to wait for that to ripple out into the compiler 
distributions.)

 - Dennis

-Original Message-
From: Pedro F. Giffuni [mailto:giffu...@tutopia.com] 
Sent: Wednesday, September 28, 2011 08:32
To: ooo-dev@incubator.apache.org
Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A 
systematic approach to IP review?]

FWIW;

I don't like the patches because I can't really examine well
the code, besides this is something the VCS handles acceptably:
commit the original sourcecode and then apply the patches in a
different commit. If we start with up to date versions there
would not be much trouble.

just my $0.02, not an objection.

Pedro.

--- On Wed, 9/28/11, Jürgen Schmidt  wrote:

...

> > I wouldn't give up the patches, as they allow to
> handle updates better.
> > This would cause a problem, as direct changes to the
> 3rd party stuff without
> > additional authorization (means: changing the source
> code must not happen
> > accidently, only when the 3rd party code gets an
> update from upstream) must
> > be prevented, while still patch files must be allowed
> to added, removed, or
> > changed, not the original source code. If that wasn't
> possible or too
> > cumbersome, checking in the tarballs in "3rdparty"
> would be better.
> >
> 
> i also wouldn't give up the patches and for that reason i
> would like to move
> forward for now with keeping the tarballs as proposed. But
> i like the name
> "3rdparty" for the directory and we can later on change it
> from the tarballs
> to the unpacked code it we see demand for it. At the moment
> it's just easier
> to keep the tarballs and focus on other work.
> 
> 
> >
> > As svn users never download the complete history as
> DSCM users do, the pain
> > of binary files in the repo isn't that hard. In case
> AOOo moved to a DSCM
> > again later, the tarballs could be moved out again
> easily.
> >
> 
> agree, we don't really loose anything, can change if
> necessary and can
> continue with our work
> 
> Juergen
> 



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Pedro F. Giffuni
FWIW;

I don't like the patches because I can't really examine well
the code, besides this is something the VCS handles acceptably:
commit the original sourcecode and then apply the patches in a
different commit. If we start with up to date versions there
would not be much trouble.

just my $0.02, not an objection.

Pedro.

--- On Wed, 9/28/11, Jürgen Schmidt  wrote:

...

> > I wouldn't give up the patches, as they allow to
> handle updates better.
> > This would cause a problem, as direct changes to the
> 3rd party stuff without
> > additional authorization (means: changing the source
> code must not happen
> > accidently, only when the 3rd party code gets an
> update from upstream) must
> > be prevented, while still patch files must be allowed
> to added, removed, or
> > changed, not the original source code. If that wasn't
> possible or too
> > cumbersome, checking in the tarballs in "3rdparty"
> would be better.
> >
> 
> i also wouldn't give up the patches and for that reason i
> would like to move
> forward for now with keeping the tarballs as proposed. But
> i like the name
> "3rdparty" for the directory and we can later on change it
> from the tarballs
> to the unpacked code it we see demand for it. At the moment
> it's just easier
> to keep the tarballs and focus on other work.
> 
> 
> >
> > As svn users never download the complete history as
> DSCM users do, the pain
> > of binary files in the repo isn't that hard. In case
> AOOo moved to a DSCM
> > again later, the tarballs could be moved out again
> easily.
> >
> 
> agree, we don't really loose anything, can change if
> necessary and can
> continue with our work
> 
> Juergen
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Jürgen Schmidt
On Wed, Sep 28, 2011 at 9:23 AM, Mathias Bauer wrote:

> What might be the best way to handle 3rd party code in AOOo probably will
> depend on the needs of the developers as well as on legal requirements.
>
> We had these tarballs plus patches IIRC because Sun Legal required that all
> used 3rd party stuff should be preserved in our repos in its original form.
>
> As a developer I always had preferred to have 3rd party code treated in the
> *build* like the internal source code.
>
> So if there wasn't a requirement to have unpatched sources in the
> repository, the most natural way to keep 3rd party stuff would be to have a
> third sub-repo "3rdparty" next to "main" and "extras" with the 3rd party
> stuff checked in. Not the tarballs, just the unpacked content.
>
> I wouldn't give up the patches, as they allow to handle updates better.
> This would cause a problem, as direct changes to the 3rd party stuff without
> additional authorization (means: changing the source code must not happen
> accidently, only when the 3rd party code gets an update from upstream) must
> be prevented, while still patch files must be allowed to added, removed, or
> changed, not the original source code. If that wasn't possible or too
> cumbersome, checking in the tarballs in "3rdparty" would be better.
>

i also wouldn't give up the patches and for that reason i would like to move
forward for now with keeping the tarballs as proposed. But i like the name
"3rdparty" for the directory and we can later on change it from the tarballs
to the unpacked code it we see demand for it. At the moment it's just easier
to keep the tarballs and focus on other work.


>
> As svn users never download the complete history as DSCM users do, the pain
> of binary files in the repo isn't that hard. In case AOOo moved to a DSCM
> again later, the tarballs could be moved out again easily.
>

agree, we don't really loose anything, can change if necessary and can
continue with our work

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-28 Thread Mathias Bauer

On 20.09.2011 16:36, Pavel Janík wrote:

Have we ever considered using version control to...uh...manage file
versions?

Just an idea.



Maybe Heiner will say more, but in the past, we have had the external
tarballs in the VCS, but then we moved them out and it worked very
well. There never was a reason to track external.tar.gz files in VCS,
because we do not change them.
What might be the best way to handle 3rd party code in AOOo probably 
will depend on the needs of the developers as well as on legal requirements.


We had these tarballs plus patches IIRC because Sun Legal required that 
all used 3rd party stuff should be preserved in our repos in its 
original form.


As a developer I always had preferred to have 3rd party code treated in 
the *build* like the internal source code.


So if there wasn't a requirement to have unpatched sources in the 
repository, the most natural way to keep 3rd party stuff would be to 
have a third sub-repo "3rdparty" next to "main" and "extras" with the 
3rd party stuff checked in. Not the tarballs, just the unpacked content.


I wouldn't give up the patches, as they allow to handle updates better. 
This would cause a problem, as direct changes to the 3rd party stuff 
without additional authorization (means: changing the source code must 
not happen accidently, only when the 3rd party code gets an update from 
upstream) must be prevented, while still patch files must be allowed to 
added, removed, or changed, not the original source code. If that wasn't 
possible or too cumbersome, checking in the tarballs in "3rdparty" would 
be better.


As svn users never download the complete history as DSCM users do, the 
pain of binary files in the repo isn't that hard. In case AOOo moved to 
a DSCM again later, the tarballs could be moved out again easily.


Regards,
Mathias


RE: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Dennis E. Hamilton
You can get anything off of the web interface of SVN at the individual level 
without it being in a working copy, though of course it has to be somewhere 
local while it is being processed in a build.

But if you check-out the trunk, you get everything that is in the trunk HEAD 
(or a specified) version.

As far as I know, you can do a check-out anywhere deeper in the tree and avoid 
everything not at that node [and below].  For example, just checkout 
trunk/main.  

It takes some consideration of SVN organization to have the desired flavors in 
convenient chunks that people can work with without having to eat the whole 
thing (with regard to SVN checkout, SVN update and, of course, SVN commits).  I 
can testify that an SVN UDPATE of the working copy of the entire incubator/ooo/ 
subtree is a painful experience, even when there is nothing to update.

 - Dennis

PS: I find it an interesting characteristic of SVN that trunk, tags, and 
branches are just names of folders and don't mean anything special to SVN.  The 
nomenclature and it is use is a matter of custom, like code indentation rules 
for ( ... }.


-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Thursday, September 22, 2011 05:24
To: ooo-dev@incubator.apache.org
Subject: Re: handling of ext_sources - Juergen's suggestion [was: Re: A 
systematic approach to IP review?]

2011/9/22 Pavel Janík :
>> Proposed way to move forward
>>
>> 1. put the externals under .../trunk/ext_sources
>> .../trunk/ext_sources
>> .../trunk/main
>> .../trunk/extras
>> 2. adapt configure to use this as default, disable the download (maybe
>> reactivate it later if we move to a DSCM)
>> 3. keep the process with checking the md5 sum as it is (for potential later
>> use)
>>
>> Any opinions or suggestions?
>
>
> +1.
>
> And one more question:
>
> If we put something into SVN into .../trunk/ext_sources, do we have some URL 
> that can replace http://hg so users don't have to check out everything? 
> Ie. do we have a URL where we have "real checkout" of the SVN? Some SVN web 
> interface? Don't know Apache infra well yet... That would be real killer 
> solution!
> --

I was thinking something similar.  We only need to use the SVN
interface to the files when we're adding or updating.  But we can have
bootstrap continue to download via http.  The location, using
Juergen's proposed location, would be
http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

This would save having a duplicate local SVN working copy of the file, right?

-Rob



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir
On Thu, Sep 22, 2011 at 9:40 AM, Shao Zhi Zhao  wrote:

> hi,
>
> Based on this result, an other trunk will be like the following if IBM
> symphony checked in:
> /ooo/symphony-src/trunk/main
> /ooo/symphony-src/trunk/extras
> /ooo/symphony-src/tags
> /ooo/symphony-src/branches
>
> thus it introduces a problem:
> How to merge the two trunks of symphony-src and ooo-src?
>
> I don't think moving the tree down one level introduces any new problems
for Symphony, so long as the directories within */main. remain the same.

Of course, merging code from Symphony into AOOo will be difficult in
general.  The problem is how do we establish a common "ancestor" revision to
do a 3-way merge with?  This will really depend on whether Symphony has a
good record of what the corresponding OOo revision was for each of its
initial files.

If not, then you can do a text diff and do some merging without trouble.
But dealing with renamed files, or moved files, or deleted files, these are
trickier to process automatically.

If you don't have that history, then in theory it could be reestablished by
taking the initial revision of each file in Symphony and comparing it to
each revision of the same file in OOo Mercurial, and find which revision
matches.  It might be possible to establish enough context for a 3-way merge
that way.

-Rob


>
>
> thanks
>
> mail:zhaos...@cn.ibm.com
> Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8,
> Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
> P.R.China
>
> [image: Inactive hide details for Rob Weir ---2011-09-22 21:20:44---Rob
> Weir ]Rob Weir ---2011-09-22 21:20:44---Rob Weir <
> robw...@apache.org>
>
>
> *Rob Weir *
>
> 2011-09-22 21:18
> Please respond to
> ooo-dev@incubator.apache.org
>
>
>
> To
>
> ooo-dev@incubator.apache.org,
> cc
>
>
> Subject
>
> Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic
> approach to IP review?]
>
> 2011/9/22 Jürgen Schmidt :
> > 2011/9/22 Jürgen Schmidt 
> >
> >> On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir  wrote:
> >>
> >>>
> >>> I was thinking something similar.  We only need to use the SVN
> >>> interface to the files when we're adding or updating.  But we can have
> >>> bootstrap continue to download via http.  The location, using
> >>> Juergen's proposed location, would be
> >>> http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
> >>>
> >>> yes, this is the correct URL, the URL that i have posted wouldn't work
> >>
> >> Juergen
> >>
> >>
> >>> This would save having a duplicate local SVN working copy of the file,
> >>> right?
> >>>
> >>>
> > mmh, no or i understand something wrong. People checkout .../trunk and
> would
> > get "ext_sources", "main" and "extras". To benefit from the modified
> script
> > we have to put "ext_sources" besides "trunk"
> >
> > .../ooo/ext_sources
> > .../ooo/trunk/main
> > .../ooo/trunk/extras
> >
> > Means back to my initial proposal, right?
> >
>
> I think the idea is this:  Everything under ooo represents what goes
> into a release.  It can be tagged and branched.  trunk/ is a peer to a
> tags/ and branches/ directory.
>
> It is possible that we have this wrong.  Adding in site/ and ooo-site/
> brings in a different convention.  They have are set up to have
> trunk/tags/branches underneath them.  That is fine, because the
> website does not "release" in synch with an OOo release.  It makes
> sense for them to be able to tag and branch independently.
>
> We should also consider how the project grows going forward.  We know
> that other code bases will be checked in, like Symphony.  And there
> are other, small, but disjoint contributions that I'm working on as
> well.
>
> So it might make sense to move trunk down one level:
>
> /ooo/ooo-src/trunk/main
> /ooo/ooo-src/trunk/extras
> /ooo/ooo-src/trunk/ext-sources
> /ooo/ooo-src/tags
> /ooo/ooo-src/branches
>
> That would make more sense then, as a unit, since we would want to tag
> the across all of /ooo/ooo-src/ to define a release.
>
> I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
> they need the additional "extras" then they check that out separately.
> I don't think most users will want to check out the entire trunk all
> the time.   We should consider also how we want this tree to grow over
> time, as other related
>
> In the end, I think we want to preserve the ability to:
>
> 1) Preserve an audit trail of all changes that went into a release
>
> 2) Do be able to tag and branch a release and everything that is in the
> release
>
> 3) Restore the exact state of a previous tagged release, including the
> exact ext-sources used in that release
>
> I'm certain that my proposal will enable this.  There may be other
> approaches that do as well.
>
> Another thing to keep in mind is the SVN support for "externals":
>
> http://svnbook.red-bean.com/en/1.0/ch07s03.html
>
> This might make some things easier.
>
> -Rob
>
> > Juergen
> >
>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
On Thu, Sep 22, 2011 at 3:18 PM, Rob Weir  wrote:

> It is possible that we have this wrong.  Adding in site/ and ooo-site/
> brings in a different convention.  They have are set up to have
> trunk/tags/branches underneath them.  That is fine, because the
> website does not "release" in synch with an OOo release.  It makes
> sense for them to be able to tag and branch independently.
>

agree


> We should also consider how the project grows going forward.  We know
> that other code bases will be checked in, like Symphony.  And there
> are other, small, but disjoint contributions that I'm working on as
> well.
>
> So it might make sense to move trunk down one level:
>
> /ooo/ooo-src/trunk/main
> /ooo/ooo-src/trunk/extras
> /ooo/ooo-src/trunk/ext-sources
> /ooo/ooo-src/tags
> /ooo/ooo-src/branches
>
> That would make more sense then, as a unit, since we would want to tag
> the across all of /ooo/ooo-src/ to define a release.
>
>
agree, from this perspective it make sense. The question then is when we
want to introduce this further level?


> I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
> they need the additional "extras" then they check that out separately.
>  I don't think most users will want to check out the entire trunk all
> the time.   We should consider also how we want this tree to grow over
> time, as other related
>

i assumed that a developer will check out trunk, maybe a wrong assumption


>
> In the end, I think we want to preserve the ability to:
>
> 1) Preserve an audit trail of all changes that went into a release
>
> 2) Do be able to tag and branch a release and everything that is in the
> release
>
> 3) Restore the exact state of a previous tagged release, including the
> exact ext-sources used in that release
>
> I'm certain that my proposal will enable this.  There may be other
> approaches that do as well.
>

i think so too. And with my changed mindset to not always check out trunk
completely i am fine with this approach.


>
> Another thing to keep in mind is the SVN support for "externals":
>
> http://svnbook.red-bean.com/en/1.0/ch07s03.html
>
>
interesting, i didn't know that before

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Shao Zhi Zhao


hi,

Based on this result, an other trunk will be like the following if IBM
symphony checked in:
/ooo/symphony-src/trunk/main
/ooo/symphony-src/trunk/extras
/ooo/symphony-src/tags
/ooo/symphony-src/branches

thus it introduces a problem:
How to merge the two trunks of symphony-src and ooo-src?



thanks

mail:zhaos...@cn.ibm.com
Address:2/F,Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No.8,
Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
P.R.China


   
 Rob Weir  
 To
   ooo-dev@incubator.apache.org,   
 2011-09-22 21:18   cc
   
   Subject
 Please respond to Re: handling of ext_sources -   
 ooo-dev@incubator Juergen's suggestion [was: Re: A
.apache.orgsystematic approach to IP review?]
   
   
   
   
   
   




2011/9/22 Jürgen Schmidt :
> 2011/9/22 Jürgen Schmidt 
>
>> On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir  wrote:
>>
>>>
>>> I was thinking something similar.  We only need to use the SVN
>>> interface to the files when we're adding or updating.  But we can have
>>> bootstrap continue to download via http.  The location, using
>>> Juergen's proposed location, would be
>>> http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
>>>
>>> yes, this is the correct URL, the URL that i have posted wouldn't work
>>
>> Juergen
>>
>>
>>> This would save having a duplicate local SVN working copy of the file,
>>> right?
>>>
>>>
> mmh, no or i understand something wrong. People checkout .../trunk and
would
> get "ext_sources", "main" and "extras". To benefit from the modified
script
> we have to put "ext_sources" besides "trunk"
>
> .../ooo/ext_sources
> .../ooo/trunk/main
> .../ooo/trunk/extras
>
> Means back to my initial proposal, right?
>

I think the idea is this:  Everything under ooo represents what goes
into a release.  It can be tagged and branched.  trunk/ is a peer to a
tags/ and branches/ directory.

It is possible that we have this wrong.  Adding in site/ and ooo-site/
brings in a different convention.  They have are set up to have
trunk/tags/branches underneath them.  That is fine, because the
website does not "release" in synch with an OOo release.  It makes
sense for them to be able to tag and branch independently.

We should also consider how the project grows going forward.  We know
that other code bases will be checked in, like Symphony.  And there
are other, small, but disjoint contributions that I'm working on as
well.

So it might make sense to move trunk down one level:

/ooo/ooo-src/trunk/main
/ooo/ooo-src/trunk/extras
/ooo/ooo-src/trunk/ext-sources
/ooo/ooo-src/tags
/ooo/ooo-src/branches

That would make more sense then, as a unit, since we would want to tag
the across all of /ooo/ooo-src/ to define a release.

I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
they need the additional "extras" then they check that out separately.
 I don't think most users will want to check out the entire trunk all
the time.   We should consider also how we want this tree to grow over
time, as other related

In the end, I think we want to preserve the ability to:

1) Preserve an audit trail of all changes that went into a release

2) Do be able to tag and branch a release and everything that is in the
release

3) Restore the exact state of a previous tagged release, including the
exact ext-sources used in that release

I'm certain that my proposal will enable this.  There may be other
approaches that do as well.

Another thing to keep in mind is the SVN support for "externals":

http://svnbook.red-bean.com/en/1.0/ch07s03.html

This might make some things easier.

-Rob

> Juergen
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir
2011/9/22 Jürgen Schmidt :
> 2011/9/22 Jürgen Schmidt 
>
>> On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir  wrote:
>>
>>>
>>> I was thinking something similar.  We only need to use the SVN
>>> interface to the files when we're adding or updating.  But we can have
>>> bootstrap continue to download via http.  The location, using
>>> Juergen's proposed location, would be
>>> http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
>>>
>>> yes, this is the correct URL, the URL that i have posted wouldn't work
>>
>> Juergen
>>
>>
>>> This would save having a duplicate local SVN working copy of the file,
>>> right?
>>>
>>>
> mmh, no or i understand something wrong. People checkout .../trunk and would
> get "ext_sources", "main" and "extras". To benefit from the modified script
> we have to put "ext_sources" besides "trunk"
>
> .../ooo/ext_sources
> .../ooo/trunk/main
> .../ooo/trunk/extras
>
> Means back to my initial proposal, right?
>

I think the idea is this:  Everything under ooo represents what goes
into a release.  It can be tagged and branched.  trunk/ is a peer to a
tags/ and branches/ directory.

It is possible that we have this wrong.  Adding in site/ and ooo-site/
brings in a different convention.  They have are set up to have
trunk/tags/branches underneath them.  That is fine, because the
website does not "release" in synch with an OOo release.  It makes
sense for them to be able to tag and branch independently.

We should also consider how the project grows going forward.  We know
that other code bases will be checked in, like Symphony.  And there
are other, small, but disjoint contributions that I'm working on as
well.

So it might make sense to move trunk down one level:

/ooo/ooo-src/trunk/main
/ooo/ooo-src/trunk/extras
/ooo/ooo-src/trunk/ext-sources
/ooo/ooo-src/tags
/ooo/ooo-src/branches

That would make more sense then, as a unit, since we would want to tag
the across all of /ooo/ooo-src/ to define a release.

I assume a developer still just checks out ooo/ooo-src/trunk/main.  If
they need the additional "extras" then they check that out separately.
 I don't think most users will want to check out the entire trunk all
the time.   We should consider also how we want this tree to grow over
time, as other related

In the end, I think we want to preserve the ability to:

1) Preserve an audit trail of all changes that went into a release

2) Do be able to tag and branch a release and everything that is in the release

3) Restore the exact state of a previous tagged release, including the
exact ext-sources used in that release

I'm certain that my proposal will enable this.  There may be other
approaches that do as well.

Another thing to keep in mind is the SVN support for "externals":

http://svnbook.red-bean.com/en/1.0/ch07s03.html

This might make some things easier.

-Rob

> Juergen
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
2011/9/22 Jürgen Schmidt 

> On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir  wrote:
>
>>
>> I was thinking something similar.  We only need to use the SVN
>> interface to the files when we're adding or updating.  But we can have
>> bootstrap continue to download via http.  The location, using
>> Juergen's proposed location, would be
>> http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
>>
>> yes, this is the correct URL, the URL that i have posted wouldn't work
>
> Juergen
>
>
>> This would save having a duplicate local SVN working copy of the file,
>> right?
>>
>>
mmh, no or i understand something wrong. People checkout .../trunk and would
get "ext_sources", "main" and "extras". To benefit from the modified script
we have to put "ext_sources" besides "trunk"

.../ooo/ext_sources
.../ooo/trunk/main
.../ooo/trunk/extras

Means back to my initial proposal, right?

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
On Thu, Sep 22, 2011 at 2:23 PM, Rob Weir  wrote:

>
> I was thinking something similar.  We only need to use the SVN
> interface to the files when we're adding or updating.  But we can have
> bootstrap continue to download via http.  The location, using
> Juergen's proposed location, would be
> http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources
>
> yes, this is the correct URL, the URL that i have posted wouldn't work

Juergen


> This would save having a duplicate local SVN working copy of the file,
> right?
>
> -Rob
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Rob Weir
2011/9/22 Pavel Janík :
>> Proposed way to move forward
>>
>> 1. put the externals under .../trunk/ext_sources
>> .../trunk/ext_sources
>> .../trunk/main
>> .../trunk/extras
>> 2. adapt configure to use this as default, disable the download (maybe
>> reactivate it later if we move to a DSCM)
>> 3. keep the process with checking the md5 sum as it is (for potential later
>> use)
>>
>> Any opinions or suggestions?
>
>
> +1.
>
> And one more question:
>
> If we put something into SVN into .../trunk/ext_sources, do we have some URL 
> that can replace http://hg so users don't have to check out everything? 
> Ie. do we have a URL where we have "real checkout" of the SVN? Some SVN web 
> interface? Don't know Apache infra well yet... That would be real killer 
> solution!
> --

I was thinking something similar.  We only need to use the SVN
interface to the files when we're adding or updating.  But we can have
bootstrap continue to download via http.  The location, using
Juergen's proposed location, would be
http://svn.apache.org/repos/asf/incubator/ooo/trunk/ext-sources

This would save having a duplicate local SVN working copy of the file, right?

-Rob


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Pavel Janík
> don't know if it is what you are looking for but
> 
> wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/
> ?view=co
> 
> should download the head version.

Then we should be able to have both things solved - files in SVN and with a 
relatively small change in the download script also the remote fetching of the 
files if we do not have ext_sources local checkout.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
2011/9/22 Pavel Janík 

> > Proposed way to move forward
> >
> > 1. put the externals under .../trunk/ext_sources
> > .../trunk/ext_sources
> > .../trunk/main
> > .../trunk/extras
> > 2. adapt configure to use this as default, disable the download (maybe
> > reactivate it later if we move to a DSCM)
> > 3. keep the process with checking the md5 sum as it is (for potential
> later
> > use)
> >
> > Any opinions or suggestions?
>
>
> +1.
>
> And one more question:
>
> If we put something into SVN into .../trunk/ext_sources, do we have some
> URL that can replace http://hg so users don't have to check out
> everything? Ie. do we have a URL where we have "real checkout" of the SVN?
> Some SVN web interface? Don't know Apache infra well yet... That would be
> real killer solution!
>

don't know if it is what you are looking for but

wget http://svn.apache.org/viewvc/incubator/ooo/trunk/main/
?view=co

should download the head version.

Juergen



> --
> Pavel Janík
>
>
>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Armin Le Grand

On 22.09.2011 13:19, Jürgen Schmidt wrote:

On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtienwrote:


On 09/20/2011 05:26 PM, Rob Weir wrote:

...


Placing all the external tarballs in the VCS is a real killer if using a
distributed SCM like git or Mercurial, thats why we had moved them out. As
Pavel said, it worked quite nice. As for the audit possibility, we
referenced the external tar balls in the source tree by file name and a md5
check sum, which works just as reliantly as putting them directly into the
repository.

Nowadays the DSCM have some alternative methods which deal with such blobs
but in essence they also keep them separate.

If AOOo ever plans to go back to a DSCM I would keep the source tree and
the external blobs strictly separated.

All in all the general SCM tooling community opinion trend seems to be that
a S(ource)CM system is for, well, source and external dependencies are
better handled with other mechanism, like Maven or so.

With SVN all this is less of a concern, naturally.

ok, we have several arguments for and against but no decision how we want

to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere, check
md5 sum, unpack, patch, build
1.1 "somewhere" is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue because
in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version ->  simply checkout the version tag and everything is in
place ...

3. in a DSCM it would be a real problem over time because of the increasing
space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository using
for example git svn or something else ->  disadvantage of the increasing
space but probably acceptable if a clean local trunk will be kept and
updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential later
use)

Any opinions or suggestions?


+1

Best current solution: Added to SVN where it does not really matter, and 
a way to get back when we may change to a DSCM in the future.



Juergen



sincerely,
Armin
--
ALG



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Pavel Janík
> Proposed way to move forward
> 
> 1. put the externals under .../trunk/ext_sources
> .../trunk/ext_sources
> .../trunk/main
> .../trunk/extras
> 2. adapt configure to use this as default, disable the download (maybe
> reactivate it later if we move to a DSCM)
> 3. keep the process with checking the md5 sum as it is (for potential later
> use)
> 
> Any opinions or suggestions?


+1.

And one more question:

If we put something into SVN into .../trunk/ext_sources, do we have some URL 
that can replace http://hg so users don't have to check out everything? Ie. 
do we have a URL where we have "real checkout" of the SVN? Some SVN web 
interface? Don't know Apache infra well yet... That would be real killer 
solution!
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-22 Thread Jürgen Schmidt
On Thu, Sep 22, 2011 at 12:40 AM, Jens-Heiner Rechtien wrote:

> On 09/20/2011 05:26 PM, Rob Weir wrote:
>
>> Ai2011/9/20 Pavel Janík:
>>
>>> Have we ever considered using version control to...uh...manage file
 versions?

 Just an idea.

>>>
>>>
>>> Maybe Heiner will say more, but in the past, we have had the external
>>> tarballs in the VCS, but then we moved them out and it worked very well.
>>> There never was a reason to track external.tar.gz files in VCS, because we
>>> do not change them.
>>> --
>>>
>>
>> That's fine.  If they don't change, then doing a "svn update" will not
>> bring them down each time.
>>
>> Aside from being useful for version control, SVN is useful also very
>> useful as an audit trail.  So in the rare occasions when one of these
>> files does change, we know who changed it and why.  This is important
>> for ensuring the IP cleanliness of the project.
>>
>> Is your main concern performance?  Even as individual tarballs,
>> ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
>> And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
>> huge contributor to download time.
>>
>
> Placing all the external tarballs in the VCS is a real killer if using a
> distributed SCM like git or Mercurial, thats why we had moved them out. As
> Pavel said, it worked quite nice. As for the audit possibility, we
> referenced the external tar balls in the source tree by file name and a md5
> check sum, which works just as reliantly as putting them directly into the
> repository.
>
> Nowadays the DSCM have some alternative methods which deal with such blobs
> but in essence they also keep them separate.
>
> If AOOo ever plans to go back to a DSCM I would keep the source tree and
> the external blobs strictly separated.
>
> All in all the general SCM tooling community opinion trend seems to be that
> a S(ource)CM system is for, well, source and external dependencies are
> better handled with other mechanism, like Maven or so.
>
> With SVN all this is less of a concern, naturally.
>
> ok, we have several arguments for and against but no decision how we want
to move forward. Let us take again a look on it

1. we have a working mechanism to get the externals from somewhere, check
md5 sum, unpack, patch, build
1.1 "somewhere" is configurable during the configure step, initially the
externals are downloaded from http://hg.services.openoffice.org/binaries

2. having the externals in the repository (SVN) won't be a big issue because
in case of a checkout always the tip version is downloaded
2.1 the SCM can be used to track the used version of the externals for a
specific OO version -> simply checkout the version tag and everything is in
place ...

3. in a DSCM it would be a real problem over time because of the increasing
space of all versions

4. we need a replacement http://hg.services.openoffice.org/binaries asap
(who knows how long the server will be available)

5. many developers probably work with a local clone of the repository using
for example git svn or something else -> disadvantage of the increasing
space but probably acceptable if a clean local trunk will be kept and
updated

Proposed way to move forward

1. put the externals under .../trunk/ext_sources
.../trunk/ext_sources
.../trunk/main
.../trunk/extras
2. adapt configure to use this as default, disable the download (maybe
reactivate it later if we move to a DSCM)
3. keep the process with checking the md5 sum as it is (for potential later
use)

Any opinions or suggestions?

Juergen


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-21 Thread Jens-Heiner Rechtien

On 09/20/2011 05:26 PM, Rob Weir wrote:

Ai2011/9/20 Pavel Janík:

Have we ever considered using version control to...uh...manage file versions?

Just an idea.



Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
--


That's fine.  If they don't change, then doing a "svn update" will not
bring them down each time.

Aside from being useful for version control, SVN is useful also very
useful as an audit trail.  So in the rare occasions when one of these
files does change, we know who changed it and why.  This is important
for ensuring the IP cleanliness of the project.

Is your main concern performance?  Even as individual tarballs,
ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
huge contributor to download time.


Placing all the external tarballs in the VCS is a real killer if using a 
distributed SCM like git or Mercurial, thats why we had moved them out. 
As Pavel said, it worked quite nice. As for the audit possibility, we 
referenced the external tar balls in the source tree by file name and a 
md5 check sum, which works just as reliantly as putting them directly 
into the repository.


Nowadays the DSCM have some alternative methods which deal with such 
blobs but in essence they also keep them separate.


If AOOo ever plans to go back to a DSCM I would keep the source tree and 
the external blobs strictly separated.


All in all the general SCM tooling community opinion trend seems to be 
that a S(ource)CM system is for, well, source and external dependencies 
are better handled with other mechanism, like Maven or so.


With SVN all this is less of a concern, naturally.

Heiner

--
Jens-Heiner Rechtien


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-21 Thread Martin Hollmichel

Am 20.09.2011 16:36, schrieb Pavel Janík:

Have we ever considered using version control to...uh...manage file versions?

Just an idea.


Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
I don't like the idea of having tar.gz in Version control in SCM's, it 
blows up the repository size over time significantly, and if using a SCM 
having the full repo on your hard disk, this is not what you want, at 
least if we speak about GBs after some years,


Martin



Re: A systematic approach to IP review?

2011-09-20 Thread Rob Weir
2011/9/20 Jürgen Schmidt :
> On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru  wrote:
>
>> So... has anyone actually run Apache RAT yet?  It has a scan only mode
>> which I'd think would be the simplest place to start.
>>
>> it's on my todo list to take a look on it, probably i will come back with
> questions
>

I did a run earlier today.  Good news is we have 4 files with Apache
license.  Bad news is we have 52,876 files with "unknown" license.  In
most cases that should just be the standard OOo header.

These scans will be much more useful after we've replaced the OOo
headers with Apache headers.  But we can't just do a global change.
We should only make that change for files that are in the official
Oracle SGA.  After that is done, then the RAT report will be more
useful.

> Juergen
>
>
>> Personally, I'd recommend working on basic RAT scans, with the scripts to
>> run them and any exception rules (for known files, etc.) all checked into
>> SVN with the build tools for the code.  But hey, it's easy for me to suggest
>> "we" do stuff, when I only currently have time to be a mentor and thus can
>> get away with just making suggestions.  8-)
>>
>> I like the general concept of storing the IP type for files in SVN
>> properties; although properties are easy to change, Apache does have a
>> strong history of being able to provide oversight for commit logs throughout
>> a project's history.
>>
>> - Shane
>>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
> Is your main concern performance?  Even as individual tarballs,
> ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
> And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
> huge contributor to download time.

You have to think about compressed data. ext_csources is 250MB *after* 
compression. extras and main *can* be compressed.

But for me, it doesn't matter if it is in VCS or in a directory. And yes, VCS 
allows change tracking.

The file names on hg are prepended by MD5 sum:

Pavel-Janiks-MacBook-Pro:.ooo_tar_balls pavel$ md5sum 
d70951c80dabecc2892c919ff5d07172-db-4.7.25.NC-custom.tar.gz 
d70951c80dabecc2892c919ff5d07172  
d70951c80dabecc2892c919ff5d07172-db-4.7.25.NC-custom.tar.gz
Pavel-Janiks-MacBook-Pro:.ooo_tar_balls pavel$ 

So there is some work already done around this and it has some logic.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir
Ai2011/9/20 Pavel Janík :
>> Have we ever considered using version control to...uh...manage file versions?
>>
>> Just an idea.
>
>
> Maybe Heiner will say more, but in the past, we have had the external 
> tarballs in the VCS, but then we moved them out and it worked very well. 
> There never was a reason to track external.tar.gz files in VCS, because we do 
> not change them.
> --

That's fine.  If they don't change, then doing a "svn update" will not
bring them down each time.

Aside from being useful for version control, SVN is useful also very
useful as an audit trail.  So in the rare occasions when one of these
files does change, we know who changed it and why.  This is important
for ensuring the IP cleanliness of the project.

Is your main concern performance?  Even as individual tarballs,
ext-sources is 86 files, 250MB.  ooo/extras is 243 files and 822 MB.
And ooo/main is 76,295 files for over 900MB.  So ext-sources is not a
huge contributor to download time.

> Pavel Janík
>
>
>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
> Have we ever considered using version control to...uh...manage file versions?
> 
> Just an idea.


Maybe Heiner will say more, but in the past, we have had the external tarballs 
in the VCS, but then we moved them out and it worked very well. There never was 
a reason to track external.tar.gz files in VCS, because we do not change them.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pedro Giffuni

+1
- This will make it easier to update the BSD/MIT unrestricted stuff.
- Hopefully it also means we will eventually stop depending on GNU
  patch for the build.

Welcome Oliver!
Great job Juergen: it's the first code replacement and a very
necessary one for OO forks too (unless they want to carry
lcc's copyright;) ).

cheers,

Pedro.

On Tue, 20 Sep 2011 15:44:59 +0200, Pavel Janík  wrote:

Hi,


I like this idea.

From a developer point of view I only have to checkout "ext_sources" 
once and reference it from all my "trunks" using the already existing 
configure-switch 'with-external-tar=""'


when we will have such repository, we will surely modify the current
sources so you don't have to add such switch because ../ext_sources
will be auto-checked.

BTW - welcome! :-)




Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir
2011/9/20 Pavel Janík :
>> Would we be able to do this?  What if the flaw was related to code in
>> ext_sources?
>
> Then we patch it. Patch will be in the trunk/main, as always.
>
>> And if not us, in the project, what if some "downstream consumer" of
>> AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
>> we've already updated ext_sources for AOOo 4.0?
>
> Versions - we can and will have more tarballs of one external source.
>

Have we ever considered using version control to...uh...manage file versions?

Just an idea.

> This all is already solved.
> --
> Pavel Janík
>
>
>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Armin Le Grand

On 20.09.2011 15:58, Rob Weir wrote:

On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grand  wrote:

On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:


Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:


...


What do others think about a structure where we have "ext_sources"
besides
"trunk".

incubator/ooo/trunk
incubator/ooo/ext_source
...


So are we saying we would never need to branch or tag these files?

For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0.

Then someone finds a serious security flaw in AOOo 3.4.0, and we
decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1.

Would we be able to do this?  What if the flaw was related to code in
ext_sources?

And if not us, in the project, what if some "downstream consumer" of
AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
we've already updated ext_sources for AOOo 4.0?

In other words, how do we track, in SVN, a compatible set of matching
trunk/ and ext_source/ revisions, so we (or someone else) can recreate
any released version of AOOo?


Good point. Thus, it should be part of incubator/ooo/trunk, something like:

incubator/ooo/trunk/main
incubator/ooo/trunk/extras
incubator/ooo/trunk/ext_sources

It could be in an own repro, but this would just bring up the risk to 
not use the same tags in both (by purpose or by error).


Indeed, looks as if it has to be a part of trunk somehow. Not very nice 
for binaries.


Maybe we could find a intermediate place for them as long as we will 
need to do changes pretty often. Currently we will have to do some 
add/remove/changes to it. It could be good to add them to trunk after it 
has stabilized a little more.



-Rob





I like this idea.

  From a developer point of view I only have to checkout "ext_sources"
once and reference it from all my "trunks" using the already existing
configure-switch 'with-external-tar=""'


+1

Also, hopefully ext_sources will not change too much (after a consolidation
phase) and it's mostly binaries, thus not too well suited for a repository.
Let's not extend our main repository with those binaries, please.


Best regards, Oliver.



Regards,
Armin
--
ALG









Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
> Would we be able to do this?  What if the flaw was related to code in
> ext_sources?

Then we patch it. Patch will be in the trunk/main, as always.

> And if not us, in the project, what if some "downstream consumer" of
> AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
> we've already updated ext_sources for AOOo 4.0?

Versions - we can and will have more tarballs of one external source.

This all is already solved.
-- 
Pavel Janík





Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Rob Weir
On Tue, Sep 20, 2011 at 9:48 AM, Armin Le Grand  wrote:
> On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:
>>
>> Hi,
>>
>> On 20.09.2011 14:37, Jürgen Schmidt wrote:
>
> ...
>>>
>>> What do others think about a structure where we have "ext_sources"
>>> besides
>>> "trunk".
>>>
>>> incubator/ooo/trunk
>>> incubator/ooo/ext_source
>>> ...

So are we saying we would never need to branch or tag these files?

For example, suppose we release AOOo 3.4.0, and then later we release AOOo 4.0.

Then someone finds a serious security flaw in AOOo 3.4.0, and we
decide to release an AOOo 3.4.1 as well as a AOOo 4.0.1.

Would we be able to do this?  What if the flaw was related to code in
ext_sources?

And if not us, in the project, what if some "downstream consumer" of
AOOo 3.4.0 wants to rebuild 3.4.0 later, for a patch or whatever.  But
we've already updated ext_sources for AOOo 4.0?

In other words, how do we track, in SVN, a compatible set of matching
trunk/ and ext_source/ revisions, so we (or someone else) can recreate
any released version of AOOo?

-Rob

>>>
>>
>> I like this idea.
>>
>>  From a developer point of view I only have to checkout "ext_sources"
>> once and reference it from all my "trunks" using the already existing
>> configure-switch 'with-external-tar=""'
>
> +1
>
> Also, hopefully ext_sources will not change too much (after a consolidation
> phase) and it's mostly binaries, thus not too well suited for a repository.
> Let's not extend our main repository with those binaries, please.
>
>> Best regards, Oliver.
>>
>
> Regards,
>        Armin
> --
> ALG
>
>


Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Armin Le Grand

On 20.09.2011 15:33, Oliver-Rainer Wittmann wrote:

Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:

...

What do others think about a structure where we have "ext_sources"
besides
"trunk".

incubator/ooo/trunk
incubator/ooo/ext_source
...



I like this idea.

 From a developer point of view I only have to checkout "ext_sources"
once and reference it from all my "trunks" using the already existing
configure-switch 'with-external-tar=""'


+1

Also, hopefully ext_sources will not change too much (after a 
consolidation phase) and it's mostly binaries, thus not too well suited 
for a repository. Let's not extend our main repository with those 
binaries, please.



Best regards, Oliver.



Regards,
Armin
--
ALG



Re: handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Pavel Janík
Hi,

> I like this idea.
> 
> From a developer point of view I only have to checkout "ext_sources" once and 
> reference it from all my "trunks" using the already existing configure-switch 
> 'with-external-tar=""'

when we will have such repository, we will surely modify the current sources so 
you don't have to add such switch because ../ext_sources will be auto-checked.

BTW - welcome! :-)
-- 
Pavel Janík





handling of ext_sources - Juergen's suggestion [was: Re: A systematic approach to IP review?]

2011-09-20 Thread Oliver-Rainer Wittmann

Hi,

On 20.09.2011 14:37, Jürgen Schmidt wrote:

On Mon, Sep 19, 2011 at 1:59 PM, Rob Weir  wrote:


2011/9/19 Jürgen Schmidt:

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:


...

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,

i am not really happy with all the binaries in the trunk tree because of

the large binary blobs and i don't expect too many changes of these
dependencies. And i would like to avoid to check them out every time.

What do others think about a structure where we have "ext_sources" besides
"trunk".

incubator/ooo/trunk
incubator/ooo/ext_source
...



I like this idea.

From a developer point of view I only have to checkout "ext_sources" 
once and reference it from all my "trunks" using the already existing 
configure-switch 'with-external-tar=""'


Best regards, Oliver.


Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt
On Tue, Sep 20, 2011 at 2:34 PM, Shane Curcuru  wrote:

> So... has anyone actually run Apache RAT yet?  It has a scan only mode
> which I'd think would be the simplest place to start.
>
> it's on my todo list to take a look on it, probably i will come back with
questions

Juergen


> Personally, I'd recommend working on basic RAT scans, with the scripts to
> run them and any exception rules (for known files, etc.) all checked into
> SVN with the build tools for the code.  But hey, it's easy for me to suggest
> "we" do stuff, when I only currently have time to be a mentor and thus can
> get away with just making suggestions.  8-)
>
> I like the general concept of storing the IP type for files in SVN
> properties; although properties are easy to change, Apache does have a
> strong history of being able to provide oversight for commit logs throughout
> a project's history.
>
> - Shane
>


Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 1:59 PM, Rob Weir  wrote:

> 2011/9/19 Jürgen Schmidt :
> > On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:
> >
> >> If you haven't looked it closely, it is probably worth a few minutes
> >> of your time to review our incubation status page, especially the
> >> items under "Copyright" and "Verify Distribution Rights".  It lists
> >> the things we need to do, including:
> >>
> >>  -- Check and make sure that the papers that transfer rights to the
> >> ASF been received. It is only necessary to transfer rights for the
> >> package, the core code, and any new code produced by the project.
> >>
> >> -- Check and make sure that the files that have been donated have been
> >> updated to reflect the new ASF copyright.
> >>
> >> -- Check and make sure that for all code included with the
> >> distribution that is not under the Apache license, we have the right
> >> to combine with Apache-licensed code and redistribute.
> >>
> >> -- Check and make sure that all source code distributed by the project
> >> is covered by one or more of the following approved licenses: Apache,
> >> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
> >> the same terms.
> >>
> >> Some of this is already going on, but it is hard to get a sense of who
> >> is doing what and how much progress we have made.  I wonder if we can
> >> agree to a more systematic approach?  This will make it easier to see
> >> the progress we're making and it will also make it easier for others
> >> to help.
> >>
> >> Suggestions:
> >>
> >> 1) We need to get all files needed for the build into SVN.  Right now
> >> there are some that are copied down from the OpenOffice.org website
> >> during the build's bootstrap process.   Until we get the files all in
> >> one place it is hard to get a comprehensive view of our dependencies.
> >>
> >
> > do you mean to check in the files under ext_source into svn and remove it
> > later on when we have cleaned up the code. Or do you mean to put it
> > somehwere on apache extras?
> > I would prefer to save these binary files under apache extra if possible.
> >
>
>
> Why not just keep in in SVN?   Moving things to Apache-Extras does not
> help us with the IP review.   In other words, if we have a dependency
> on a OSS module that has an incompatible license, then moving that
> module to Apache Extras does not make that dependency go away.  We
> still need to understand the nature of the dependency: a build tool, a
> dynamic runtime dependency, a statically linked library, an optional
> extensions, a necessary core module.
>
> If we find out, for example, that something in ext-sources is only
> used as a build tool, and is not part of the release, then there is
> nothing that prevents us from hosting it in SVN.   But if something is
> a necessary library and it is under GPL, then this is a problem even
> if we store it on Apache-Extras,
>
> i am not really happy with all the binaries in the trunk tree because of
the large binary blobs and i don't expect too many changes of these
dependencies. And i would like to avoid to check them out every time.

What do others think about a structure where we have "ext_sources" besides
"trunk".

incubator/ooo/trunk
incubator/ooo/ext_source
...

If we can agree on such a structure i would move forward to bring in some
new external sources. The proposed ucpp preprocessor -> BSD license, used in
the idlc and of course part of the SDK later on. I made some tests with it
and was able to build the sources on windows in our cygwin environment with
a new gnu make file. I was also able to build udkapi and offapi with this
new and adapted idlc/ucpp without any problems -> generated type library is
equal to the old one.

I have to run some more tests on other platforms as soon as i have other
platforms available for testing. I decided to replace the preprocessor
instead of removing it because of compatibility reasons and it was of course
the easier change. The next step is to check how the process with
ext_sources work in detail in our build process and adapt the new ucpp
module. If anybody is familiar with ext_sources and can point me to
potential hurdles, please let me know (on a new thread) ;-)

Juergen


>
> >
> >>
> >> 2) Continue the CWS integrations.  Along with 1) this ensures that all
> >> the code we need for the release is in SVN.
> >>
> >> 3)  Files that Oracle include in their SGA need to have the Apache
> >> license header inserted and the Sun/Oracle copyright migrated to the
> >> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
> >> automate parts of this.
> >>
> >> 4) Once the SGA files have the Apache headers, then we can make
> >> regular use of RAT to report on files that are lacking an Apache
> >> header.  Such files might be in one of the following categories:
> >>
> >> a) Files that Oracle owns the copyright on and which should be
> >> included in an amended SGA
> >>
> >> b) Files that have a compatible OSS license which w

Re: A systematic approach to IP review?

2011-09-20 Thread Shane Curcuru
So... has anyone actually run Apache RAT yet?  It has a scan only mode 
which I'd think would be the simplest place to start.


Personally, I'd recommend working on basic RAT scans, with the scripts 
to run them and any exception rules (for known files, etc.) all checked 
into SVN with the build tools for the code.  But hey, it's easy for me 
to suggest "we" do stuff, when I only currently have time to be a mentor 
and thus can get away with just making suggestions.  8-)


I like the general concept of storing the IP type for files in SVN 
properties; although properties are easy to change, Apache does have a 
strong history of being able to provide oversight for commit logs 
throughout a project's history.


- Shane


Re: A systematic approach to IP review?

2011-09-20 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 7:05 PM, Rob Weir  wrote:

> On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) 
> wrote:
> > Am 09/19/2011 04:47 PM, schrieb Rob Weir:
> >>
> >> On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)
> >>  wrote:
> >>>
> >>> Am 09/19/2011 01:59 PM, schrieb Rob Weir:
> 
>  2011/9/19 Jürgen Schmidt:
> >
> > On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir
>  wrote:
> >
> >> If you haven't looked it closely, it is probably worth a few minutes
> >> of your time to review our incubation status page, especially the
> >> items under "Copyright" and "Verify Distribution Rights".  It lists
> >> the things we need to do, including:
> >>
> >>  -- Check and make sure that the papers that transfer rights to the
> >> ASF been received. It is only necessary to transfer rights for the
> >> package, the core code, and any new code produced by the project.
> >>
> >> -- Check and make sure that the files that have been donated have
> been
> >> updated to reflect the new ASF copyright.
> >>
> >> -- Check and make sure that for all code included with the
> >> distribution that is not under the Apache license, we have the right
> >> to combine with Apache-licensed code and redistribute.
> >>
> >> -- Check and make sure that all source code distributed by the
> project
> >> is covered by one or more of the following approved licenses:
> Apache,
> >> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with
> essentially
> >> the same terms.
> >>
> >> Some of this is already going on, but it is hard to get a sense of
> who
> >> is doing what and how much progress we have made.  I wonder if we
> can
> >> agree to a more systematic approach?  This will make it easier to
> see
> >> the progress we're making and it will also make it easier for others
> >> to help.
> >>
> >> Suggestions:
> >>
> >> 1) We need to get all files needed for the build into SVN.  Right
> now
> >> there are some that are copied down from the OpenOffice.org website
> >> during the build's bootstrap process.   Until we get the files all
> in
> >> one place it is hard to get a comprehensive view of our
> dependencies.
> >>
> >
> > do you mean to check in the files under ext_source into svn and
> remove
> > it
> > later on when we have cleaned up the code. Or do you mean to put it
> > somehwere on apache extras?
> > I would prefer to save these binary files under apache extra if
> > possible.
> >
> 
> 
>  Why not just keep in in SVN?   Moving things to Apache-Extras does not
>  help us with the IP review.   In other words, if we have a dependency
>  on a OSS module that has an incompatible license, then moving that
>  module to Apache Extras does not make that dependency go away.  We
>  still need to understand the nature of the dependency: a build tool, a
>  dynamic runtime dependency, a statically linked library, an optional
>  extensions, a necessary core module.
> 
>  If we find out, for example, that something in ext-sources is only
>  used as a build tool, and is not part of the release, then there is
>  nothing that prevents us from hosting it in SVN.   But if something is
>  a necessary library and it is under GPL, then this is a problem even
>  if we store it on Apache-Extras,
> 
> 
> >
> >>
> >> 2) Continue the CWS integrations.  Along with 1) this ensures that
> all
> >> the code we need for the release is in SVN.
> >>
> >> 3)  Files that Oracle include in their SGA need to have the Apache
> >> license header inserted and the Sun/Oracle copyright migrated to the
> >> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
> >> automate parts of this.
> >>
> >> 4) Once the SGA files have the Apache headers, then we can make
> >> regular use of RAT to report on files that are lacking an Apache
> >> header.  Such files might be in one of the following categories:
> >>
> >> a) Files that Oracle owns the copyright on and which should be
> >> included in an amended SGA
> >>
> >> b) Files that have a compatible OSS license which we are permitted
> to
> >> use.  This might require that we add a mention of it to the NOTICE
> >> file.
> >>
> >> c) Files that have an incompatible OSS license.  These need to be
> >> removed/replaced.
> >>
> >> d) Files that have an OSS license that has not yet been
> >> reviewed/categorized by Apache legal affairs.  In that case we need
> to
> >> bring it to their attention.
> >>
> >> e) (Hypothetically) files that are not under an OSS license at all.
> >> E.g., a Microsoft header file.  These must be removed.
> >>
> >> 5) We should to track the resolution of each file, and do this
> >> publicly.  The audit trail is important.  Some way

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 4:32 PM, Dennis E. Hamilton
 wrote:
> I agree that there is no escape from managing down to the individual file.  
> It is a question of organization now, where the entire base is involved.
>

RAT or something RAT-like.

> Later, if the svn:property is to be trusted, the problem is quite different, 
> it seems to me.  Plus the rules are understood and provenance and IP are 
> likely handled as anything needing clearance enters the code base.  What is 
> done to ensure a previously-vetted code base has not become tainted strikes 
> me as a kind of regression/smoke test.
>

Here is how I see SVN properties and RAT relating.   Any use of a
grep-like RAT-like tool will need to deal with exceptions.  We're
going to have stuff like binary files, say ODF files that are used for
testing, that don't have a "header".  Or files that are used only as a
build tool, checked in for convenience, but are not part of the
release.  Or 3rd party code that does not have a header, but we know
its original, like the ICU breakiterator data files.

How do we deal with those types of files, in the content of an
automated audit tool?  One solution is to record in a big config file
or script a list of all of these exceptions.  Essentially, an list of
files to ignore in the RAT scan.

That approach would certainly work, but would be fragile.  Moving or
renaming the files would break our script.  Not the end of the world,
since this could be designed to be "fail safe" and give us errors on
the files that moved.

But if we track this info in SVN, then we could generate the exclusion
list from SVN, so it automatically adjusts as files are moved or
renamed.  It also avoid the problem -- and this might just be my own
engineering esthetic -- of tracking metadata for files in two
different places.  It seems rather untidy to me.

>From a regression standpoint, you could treat all files as being in
one of several states:

1) Unexamined (no property set)

2) Apache 2.0 (included in the Oracle SGA or new code contributed by
committer or other person under iCLA)

3) Compatible 3rd party license

4) Incompatible 3rd party license

5) Not part of release

The goal would be to iterate until every file is in category 2, 3 or 5.

> It is in that regard that I am concerned the tools for this one-time case 
> need not be the same as for future cases.
>

There are two kinds of future cases:

1) Code contributed in small chunks by committers or patches, where we
can expect CTR to work.  There will be errors, but we can catch those
before we do subsequent releases via RAT.

2) Larger contributions made by SGA.  For example, the IBM Lotus
Symphony contribution, or other similar corporate contributions.  When
an Apache project receives a large code contribution like this they
need to do an IP clearance process on that contribution as well.   I
think that the RAT/SVN combination could work well here also.  The
goal would be to clear the IP on the new contributions before we start
copying or merging it into the core AOOo code.


> And, since I am not doing the work in the present case, I am offering this as 
> something to think about, not a position.
>
>  - Dennis
>


RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
I agree that there is no escape from managing down to the individual file.  It 
is a question of organization now, where the entire base is involved.

Later, if the svn:property is to be trusted, the problem is quite different, it 
seems to me.  Plus the rules are understood and provenance and IP are likely 
handled as anything needing clearance enters the code base.  What is done to 
ensure a previously-vetted code base has not become tainted strikes me as a 
kind of regression/smoke test.

It is in that regard that I am concerned the tools for this one-time case need 
not be the same as for future cases.

And, since I am not doing the work in the present case, I am offering this as 
something to think about, not a position.

 - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 09:55
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

[ ... ]

The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.

[ ... ]



RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
I hope that Rat can produce a list of OK and exclude not OK on the first use, 
since the list of not OK would overwhelm everything else about the current 
repository.

 - Dennis

-Original Message-
From: Marcus (OOo) [mailto:marcus.m...@wtnet.de] 
Sent: Monday, September 19, 2011 10:27
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

Am 09/19/2011 07:05 PM, schrieb Rob Weir:
[ ... ]

> 3) Running RAT against the source is how we ensure that the code is clean

OK, I don't know what this can do your us. Maybe it's the solution for 
the problem.

How do you know that it is not skipping anything? I guess you simply 
would trust RAT that it is doing fine, right? ;-)

BTW:
Is RAT producing a log file, so that we have a list of every file that 
was checked? This could be very helpful.

Marcus
[ ... ]



RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
I agree running rat is important ...

I haven't heard any suggestion that such an important tool not be used.

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 10:05
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

[ ... ]

I think the wiki is fine as a collaboration tool, to list tasks and
who is working on them.  But that is not a substitute for running
scans with the Apache Release Audit Tool (RAT) and working toward a
clean report.

Think of it this way:

1) We have a list of modules on the wiki that we need to replace.
Great.  Developers can work on that list.

2) But how do we know that the list on the wiki is complete?  How do
we know that it is not missing anything?

3) Running RAT against the source is how we ensure that the code is clean

In other words, the criteria should be that we have a clean RAT
record, not that we have a clean wiki.  The list of modules on the
wiki is not traceable to a scan of the source code.  It is not
reproducible.  It might be useful.  But it is not sufficient.

-Rob

[ ... ]



Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 1:19 PM, Marcus (OOo)  wrote:
> Am 09/19/2011 06:54 PM, schrieb Rob Weir:
>>
>> On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
>>   wrote:
>>>
>>> Rob,
>>>
>>> I was reading the suggestion from Marcus as it being that since the code
>>> base is in a folder structure (modularized) and the wiki can map folder
>>> structures and their status nicely, it is not necessary to have a single
>>> table to manage this from, but have any tables be at some appropriate
>>> granularity toward the leaves of the hierarchy (on the wiki).
>>>
>>
>> Using the wiki for this might be useful for tracking the status of
>> modules we already know we need to replace.  Bugzilla would be another
>> way to track the status.
>
> How do you want to use Bugzilla to track thousands of files?
>

No.  But for tracking module review, Bugzilla might be better than the
wiki.  It allows us to have a conversation on each module via
comments.

>> But it is not really a sufficient solution.  Why?  Because it is not
>> tied to the code and is not reproducible.  How was the list of
>> components listed in the wiki generated?  Based on what script?  Where
>> is the script?  How do we know it is accurate and current?  How do we
>> know that integrating a CWS does not make that list become outdated?
>> How do we prove to ourselves that we did this right?  And how to we
>> record that proof as a record?  And how do we repeat this proof every
>> time we do a new release?
>
> Questions over questions but not helpful. ;-)
>
> A list of components of unknown derivation sitting on a community wiki
>> that anyone can edit is not really a suitable basis for an IP review.
>
> Then restrict the write access.
>
>> The granularity we need to worry about is the file.  That is the
>> finest grain level of having a license header.  That is the unit of
>> tracking in SVN.  That is the unit that someone could have changed the
>> content in SVN.
>>
>> Again, it is fine if someone wants to outline this at the module
>> level.  But that does not eliminate the requirement for us to do this
>> at the file level as well.
>
> IMHO you haven't understood what I wanted to tell you.
>

I understand what you are saying.  I just don't agree with you.

> Sure it makes no sense to create a list of every file in SVN to see if the
> license is good or bad. So, do it module by module. And when a module is
> marked as "done", then of course every file in the modules was checked.
> Otherwise it's not working.
>

That is not a consistent approach. Every developer applies their own
criteria.   It is not reproducible. It leaves no audit trail.  And it
doesn't help us with the next release.

If you use the Apache Release Audit Tool (RAT) then it will check all
the files automatically.

> And how to make sure that there was no change when source was
> added/moved/improved? Simply Commit Then Review (CTR). A change in the
> license header at the beginning should be remarkable, right? However, we
> also need to have trust in everybodies work.
>

We would run RAT before every release and with every significant code
contribution.

You can think of this as a form of CTR, but one that is automated,
with a consistent rule set.

Obviously, good CTR plus the work on the wiki will all help.  But we
need the RAT scans as well, to show that we're clean.

> BTW:
> What is your plan to track every file to make sure the license is OK?
>

Run RAT.  That is what it does.

> Marcus
>
>
>
>>> I can see some brittle cases, especially in the face of refactoring.  The
>>> use of the wiki might have to be an ephemeral activity that is handled this
>>> way entirely for our initial scrubbing.
>>>
>>> Ideally, additional and sustained review would be in the SVN with the
>>> artifacts so reviewed, and coalesced somehow.  The use of SVN properties is
>>> interesting, but they are rather invisible and I have a question about what
>>> happens with them when a commit happens against the particular artifact.
>>>
>>
>> Properties stick with the file, unless changed.  Think of the
>> svn:eol-style property.  It is not wiped out with a new revision of
>> the file.
>>
>>> It seems that there is some need to balance an immediate requirement and
>>> what would be sufficient for it versus what would assist us in the longer
>>> term.  It would be interesting to know what the additional-review work has
>>> become for other projects that have a substantial code base (e.g., SVN
>>> itself, httpd, ...). 

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 07:05 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo)  wrote:

Am 09/19/2011 04:47 PM, schrieb Rob Weir:


On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)
  wrote:


Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidt:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under "Copyright" and "Verify Distribution Rights".  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove
it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if
possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell wh

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 06:54 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
  wrote:

Rob,

I was reading the suggestion from Marcus as it being that since the code base 
is in a folder structure (modularized) and the wiki can map folder structures 
and their status nicely, it is not necessary to have a single table to manage 
this from, but have any tables be at some appropriate granularity toward the 
leaves of the hierarchy (on the wiki).



Using the wiki for this might be useful for tracking the status of
modules we already know we need to replace.  Bugzilla would be another
way to track the status.


How do you want to use Bugzilla to track thousands of files?


But it is not really a sufficient solution.  Why?  Because it is not
tied to the code and is not reproducible.  How was the list of
components listed in the wiki generated?  Based on what script?  Where
is the script?  How do we know it is accurate and current?  How do we
know that integrating a CWS does not make that list become outdated?
How do we prove to ourselves that we did this right?  And how to we
record that proof as a record?  And how do we repeat this proof every
time we do a new release?


Questions over questions but not helpful. ;-)


A list of components of unknown derivation sitting on a community wiki
that anyone can edit is not really a suitable basis for an IP review.


Then restrict the write access.


The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.


IMHO you haven't understood what I wanted to tell you.

Sure it makes no sense to create a list of every file in SVN to see if 
the license is good or bad. So, do it module by module. And when a 
module is marked as "done", then of course every file in the modules was 
checked. Otherwise it's not working.


And how to make sure that there was no change when source was 
added/moved/improved? Simply Commit Then Review (CTR). A change in the 
license header at the beginning should be remarkable, right? However, we 
also need to have trust in everybodies work.


BTW:
What is your plan to track every file to make sure the license is OK?

Marcus




I can see some brittle cases, especially in the face of refactoring.  The use 
of the wiki might have to be an ephemeral activity that is handled this way 
entirely for our initial scrubbing.

Ideally, additional and sustained review would be in the SVN with the artifacts 
so reviewed, and coalesced somehow.  The use of SVN properties is interesting, 
but they are rather invisible and I have a question about what happens with 
them when a commit happens against the particular artifact.



Properties stick with the file, unless changed.  Think of the
svn:eol-style property.  It is not wiped out with a new revision of
the file.


It seems that there is some need to balance an immediate requirement and what 
would be sufficient for it versus what would assist us in the longer term.  It 
would be interesting to know what the additional-review work has become for 
other projects that have a substantial code base (e.g., SVN itself, httpd, 
...).  I have no idea.



The IP review needs to occur with every release.  So the work we do to
automate this, and make it data-drive, will repay itself with every
release.

I invite you to investigate what other projects do.  When you do I
think you will agree.


  - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org]
Sent: Monday, September 19, 2011 07:47
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)  wrote:

Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidt:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirwrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under "Copyright" and "Verify Distribution Rights".  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apach

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo)  wrote:
> Am 09/19/2011 04:47 PM, schrieb Rob Weir:
>>
>> On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)
>>  wrote:
>>>
>>> Am 09/19/2011 01:59 PM, schrieb Rob Weir:

 2011/9/19 Jürgen Schmidt:
>
> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir    wrote:
>
>> If you haven't looked it closely, it is probably worth a few minutes
>> of your time to review our incubation status page, especially the
>> items under "Copyright" and "Verify Distribution Rights".  It lists
>> the things we need to do, including:
>>
>>  -- Check and make sure that the papers that transfer rights to the
>> ASF been received. It is only necessary to transfer rights for the
>> package, the core code, and any new code produced by the project.
>>
>> -- Check and make sure that the files that have been donated have been
>> updated to reflect the new ASF copyright.
>>
>> -- Check and make sure that for all code included with the
>> distribution that is not under the Apache license, we have the right
>> to combine with Apache-licensed code and redistribute.
>>
>> -- Check and make sure that all source code distributed by the project
>> is covered by one or more of the following approved licenses: Apache,
>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>> the same terms.
>>
>> Some of this is already going on, but it is hard to get a sense of who
>> is doing what and how much progress we have made.  I wonder if we can
>> agree to a more systematic approach?  This will make it easier to see
>> the progress we're making and it will also make it easier for others
>> to help.
>>
>> Suggestions:
>>
>> 1) We need to get all files needed for the build into SVN.  Right now
>> there are some that are copied down from the OpenOffice.org website
>> during the build's bootstrap process.   Until we get the files all in
>> one place it is hard to get a comprehensive view of our dependencies.
>>
>
> do you mean to check in the files under ext_source into svn and remove
> it
> later on when we have cleaned up the code. Or do you mean to put it
> somehwere on apache extras?
> I would prefer to save these binary files under apache extra if
> possible.
>


 Why not just keep in in SVN?   Moving things to Apache-Extras does not
 help us with the IP review.   In other words, if we have a dependency
 on a OSS module that has an incompatible license, then moving that
 module to Apache Extras does not make that dependency go away.  We
 still need to understand the nature of the dependency: a build tool, a
 dynamic runtime dependency, a statically linked library, an optional
 extensions, a necessary core module.

 If we find out, for example, that something in ext-sources is only
 used as a build tool, and is not part of the release, then there is
 nothing that prevents us from hosting it in SVN.   But if something is
 a necessary library and it is under GPL, then this is a problem even
 if we store it on Apache-Extras,


>
>>
>> 2) Continue the CWS integrations.  Along with 1) this ensures that all
>> the code we need for the release is in SVN.
>>
>> 3)  Files that Oracle include in their SGA need to have the Apache
>> license header inserted and the Sun/Oracle copyright migrated to the
>> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
>> automate parts of this.
>>
>> 4) Once the SGA files have the Apache headers, then we can make
>> regular use of RAT to report on files that are lacking an Apache
>> header.  Such files might be in one of the following categories:
>>
>> a) Files that Oracle owns the copyright on and which should be
>> included in an amended SGA
>>
>> b) Files that have a compatible OSS license which we are permitted to
>> use.  This might require that we add a mention of it to the NOTICE
>> file.
>>
>> c) Files that have an incompatible OSS license.  These need to be
>> removed/replaced.
>>
>> d) Files that have an OSS license that has not yet been
>> reviewed/categorized by Apache legal affairs.  In that case we need to
>> bring it to their attention.
>>
>> e) (Hypothetically) files that are not under an OSS license at all.
>> E.g., a Microsoft header file.  These must be removed.
>>
>> 5) We should to track the resolution of each file, and do this
>> publicly.  The audit trail is important.  Some ways we could do this
>> might be:
>>
>> a) Track this in SVN properties.  So set ip:sga for the SGA files,
>> ip:mit for files that are MIT licensed, etc.  This should be reflected
>> in headers as well, but this is not always possible.  For example, we
>> might have

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton
 wrote:
> Rob,
>
> I was reading the suggestion from Marcus as it being that since the code base 
> is in a folder structure (modularized) and the wiki can map folder structures 
> and their status nicely, it is not necessary to have a single table to manage 
> this from, but have any tables be at some appropriate granularity toward the 
> leaves of the hierarchy (on the wiki).
>

Using the wiki for this might be useful for tracking the status of
modules we already know we need to replace.  Bugzilla would be another
way to track the status.

But it is not really a sufficient solution.  Why?  Because it is not
tied to the code and is not reproducible.  How was the list of
components listed in the wiki generated?  Based on what script?  Where
is the script?  How do we know it is accurate and current?  How do we
know that integrating a CWS does not make that list become outdated?
How do we prove to ourselves that we did this right?  And how to we
record that proof as a record?  And how do we repeat this proof every
time we do a new release?

A list of components of unknown derivation sitting on a community wiki
that anyone can edit is not really a suitable basis for an IP review.

The granularity we need to worry about is the file.  That is the
finest grain level of having a license header.  That is the unit of
tracking in SVN.  That is the unit that someone could have changed the
content in SVN.

Again, it is fine if someone wants to outline this at the module
level.  But that does not eliminate the requirement for us to do this
at the file level as well.

> I can see some brittle cases, especially in the face of refactoring.  The use 
> of the wiki might have to be an ephemeral activity that is handled this way 
> entirely for our initial scrubbing.
>
> Ideally, additional and sustained review would be in the SVN with the 
> artifacts so reviewed, and coalesced somehow.  The use of SVN properties is 
> interesting, but they are rather invisible and I have a question about what 
> happens with them when a commit happens against the particular artifact.
>

Properties stick with the file, unless changed.  Think of the
svn:eol-style property.  It is not wiped out with a new revision of
the file.

> It seems that there is some need to balance an immediate requirement and what 
> would be sufficient for it versus what would assist us in the longer term.  
> It would be interesting to know what the additional-review work has become 
> for other projects that have a substantial code base (e.g., SVN itself, 
> httpd, ...).  I have no idea.
>

The IP review needs to occur with every release.  So the work we do to
automate this, and make it data-drive, will repay itself with every
release.

I invite you to investigate what other projects do.  When you do I
think you will agree.

>  - Dennis
>
> -Original Message-
> From: Rob Weir [mailto:robw...@apache.org]
> Sent: Monday, September 19, 2011 07:47
> To: ooo-dev@incubator.apache.org
> Subject: Re: A systematic approach to IP review?
>
> On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)  wrote:
>> Am 09/19/2011 01:59 PM, schrieb Rob Weir:
>>>
>>> 2011/9/19 Jürgen Schmidt:
>>>>
>>>> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:
>>>>
>>>>> If you haven't looked it closely, it is probably worth a few minutes
>>>>> of your time to review our incubation status page, especially the
>>>>> items under "Copyright" and "Verify Distribution Rights".  It lists
>>>>> the things we need to do, including:
>>>>>
>>>>>  -- Check and make sure that the papers that transfer rights to the
>>>>> ASF been received. It is only necessary to transfer rights for the
>>>>> package, the core code, and any new code produced by the project.
>>>>>
>>>>> -- Check and make sure that the files that have been donated have been
>>>>> updated to reflect the new ASF copyright.
>>>>>
>>>>> -- Check and make sure that for all code included with the
>>>>> distribution that is not under the Apache license, we have the right
>>>>> to combine with Apache-licensed code and redistribute.
>>>>>
>>>>> -- Check and make sure that all source code distributed by the project
>>>>> is covered by one or more of the following approved licenses: Apache,
>>>>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>>>>> the same terms.
>>>>>
>>>>> Some of this is already going on, but it is hard to get a sense of who
>>>>> is

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 04:47 PM, schrieb Rob Weir:

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)  wrote:

Am 09/19/2011 01:59 PM, schrieb Rob Weir:


2011/9/19 Jürgen Schmidt:


On Mon, Sep 19, 2011 at 2:27 AM, Rob Weirwrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under "Copyright" and "Verify Distribution Rights".  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talk

RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
On the wiki question, I think OOOUSERS should continue to be used for 
transition work.  Or OOODEV could be used if it needs to be limited to 
committers (perhaps the case for this activity), although it means power 
observers can't contribute there and have to do so by some other means.

This is transition work and the Confluence wiki seems like a good place for it.

The MW may be interrupted or disrupted and it is probably a good idea to *not* 
put such development-transition intensive content there.  

Also, "the migrated wiki" is not the live wiki at OpenOffice.org.  So doing 
anything there will create collisions.  It is also not fully migrated in that 
it is not operating in place of what folks see via OpenOffice.org as far as I 
know.  The current Confluence wikis avoid confusion and are stable for this 
particular purpose.

 - Dennis



-Original Message-
From: Jürgen Schmidt [mailto:jogischm...@googlemail.com] 
Sent: Monday, September 19, 2011 01:45
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:

[ ... ]

> 7) Goal should be for anyone today to be able to see what work remains
> for IP clearance, as well as for someone 5 years from now to be able
> to tell what we did.  Tracking this on the community wiki is probably
> not good enough, since we've previously talked about dropping that
> wiki and going to MWiki.
>

talked about it yes but did we reached a final decision?

The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can
be used. Do we want to continue with this wiki now? It's still not clear for
me at the moment.

[ ... ]



RE: A systematic approach to IP review?

2011-09-19 Thread Dennis E. Hamilton
Rob,

I was reading the suggestion from Marcus as it being that since the code base 
is in a folder structure (modularized) and the wiki can map folder structures 
and their status nicely, it is not necessary to have a single table to manage 
this from, but have any tables be at some appropriate granularity toward the 
leaves of the hierarchy (on the wiki).

I can see some brittle cases, especially in the face of refactoring.  The use 
of the wiki might have to be an ephemeral activity that is handled this way 
entirely for our initial scrubbing.

Ideally, additional and sustained review would be in the SVN with the artifacts 
so reviewed, and coalesced somehow.  The use of SVN properties is interesting, 
but they are rather invisible and I have a question about what happens with 
them when a commit happens against the particular artifact.

It seems that there is some need to balance an immediate requirement and what 
would be sufficient for it versus what would assist us in the longer term.  It 
would be interesting to know what the additional-review work has become for 
other projects that have a substantial code base (e.g., SVN itself, httpd, 
...).  I have no idea.

 - Dennis

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Monday, September 19, 2011 07:47
To: ooo-dev@incubator.apache.org
Subject: Re: A systematic approach to IP review?

On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)  wrote:
> Am 09/19/2011 01:59 PM, schrieb Rob Weir:
>>
>> 2011/9/19 Jürgen Schmidt:
>>>
>>> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:
>>>
>>>> If you haven't looked it closely, it is probably worth a few minutes
>>>> of your time to review our incubation status page, especially the
>>>> items under "Copyright" and "Verify Distribution Rights".  It lists
>>>> the things we need to do, including:
>>>>
>>>>  -- Check and make sure that the papers that transfer rights to the
>>>> ASF been received. It is only necessary to transfer rights for the
>>>> package, the core code, and any new code produced by the project.
>>>>
>>>> -- Check and make sure that the files that have been donated have been
>>>> updated to reflect the new ASF copyright.
>>>>
>>>> -- Check and make sure that for all code included with the
>>>> distribution that is not under the Apache license, we have the right
>>>> to combine with Apache-licensed code and redistribute.
>>>>
>>>> -- Check and make sure that all source code distributed by the project
>>>> is covered by one or more of the following approved licenses: Apache,
>>>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>>>> the same terms.
>>>>
>>>> Some of this is already going on, but it is hard to get a sense of who
>>>> is doing what and how much progress we have made.  I wonder if we can
>>>> agree to a more systematic approach?  This will make it easier to see
>>>> the progress we're making and it will also make it easier for others
>>>> to help.
>>>>
>>>> Suggestions:
>>>>
>>>> 1) We need to get all files needed for the build into SVN.  Right now
>>>> there are some that are copied down from the OpenOffice.org website
>>>> during the build's bootstrap process.   Until we get the files all in
>>>> one place it is hard to get a comprehensive view of our dependencies.
>>>>
>>>
>>> do you mean to check in the files under ext_source into svn and remove it
>>> later on when we have cleaned up the code. Or do you mean to put it
>>> somehwere on apache extras?
>>> I would prefer to save these binary files under apache extra if possible.
>>>
>>
>>
>> Why not just keep in in SVN?   Moving things to Apache-Extras does not
>> help us with the IP review.   In other words, if we have a dependency
>> on a OSS module that has an incompatible license, then moving that
>> module to Apache Extras does not make that dependency go away.  We
>> still need to understand the nature of the dependency: a build tool, a
>> dynamic runtime dependency, a statically linked library, an optional
>> extensions, a necessary core module.
>>
>> If we find out, for example, that something in ext-sources is only
>> used as a build tool, and is not part of the release, then there is
>> nothing that prevents us from hosting it in SVN.   But if something is
>> a necessary library and it is under GPL, then this is a problem even
>> if we store it on Apache-E

Re: A systematic approach to IP review?

2011-09-19 Thread Pedro F. Giffuni


--- On Mon, 9/19/11, Rob Weir  wrote:
...
> 2011/9/19 Jürgen Schmidt :
...
> >
> > do you mean to check in the files under ext_source
> into svn and remove it
> > later on when we have cleaned up the code. Or do you
> mean to put it
> > somehwere on apache extras?
> > I would prefer to save these binary files under apache
> extra if possible.
> >
> 
> 
> Why not just keep in in SVN?   Moving things
> to Apache-Extras does not
> help us with the IP review.   In other
> words, if we have a dependency
> on a OSS module that has an incompatible license, then
> moving that
> module to Apache Extras does not make that dependency go
> away.  We
> still need to understand the nature of the dependency: a
> build tool, a
> dynamic runtime dependency, a statically linked library, an
> optional
> extensions, a necessary core module.
>

But adding in stuff that we have to remove immediately (nss,
seamonkey, .. ) doesn't help either. I also think a lot of
that stuff has to be updated before brought in: ICU
apparently would be trouble, but the Apache-commons, ICC,
and other stuff can/should be updated.



>> a) Track this in SVN properties.  So set ip:sga
> for the SGA files,
> >> ip:mit for files that are MIT licensed, etc.


I thought we had delayed updating the copyrights in the
header to ease the CWS integration. I still hope to see
more of those, especially anything related to gnumake
(I don't know when, but dmake has to go!).

Using the SVN properties is a good idea. And we do have
to start the NOTICES file.

All just IMHO, of course.

Pedro.


Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)  wrote:
> Am 09/19/2011 01:59 PM, schrieb Rob Weir:
>>
>> 2011/9/19 Jürgen Schmidt:
>>>
>>> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:
>>>
 If you haven't looked it closely, it is probably worth a few minutes
 of your time to review our incubation status page, especially the
 items under "Copyright" and "Verify Distribution Rights".  It lists
 the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
 ASF been received. It is only necessary to transfer rights for the
 package, the core code, and any new code produced by the project.

 -- Check and make sure that the files that have been donated have been
 updated to reflect the new ASF copyright.

 -- Check and make sure that for all code included with the
 distribution that is not under the Apache license, we have the right
 to combine with Apache-licensed code and redistribute.

 -- Check and make sure that all source code distributed by the project
 is covered by one or more of the following approved licenses: Apache,
 BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
 the same terms.

 Some of this is already going on, but it is hard to get a sense of who
 is doing what and how much progress we have made.  I wonder if we can
 agree to a more systematic approach?  This will make it easier to see
 the progress we're making and it will also make it easier for others
 to help.

 Suggestions:

 1) We need to get all files needed for the build into SVN.  Right now
 there are some that are copied down from the OpenOffice.org website
 during the build's bootstrap process.   Until we get the files all in
 one place it is hard to get a comprehensive view of our dependencies.

>>>
>>> do you mean to check in the files under ext_source into svn and remove it
>>> later on when we have cleaned up the code. Or do you mean to put it
>>> somehwere on apache extras?
>>> I would prefer to save these binary files under apache extra if possible.
>>>
>>
>>
>> Why not just keep in in SVN?   Moving things to Apache-Extras does not
>> help us with the IP review.   In other words, if we have a dependency
>> on a OSS module that has an incompatible license, then moving that
>> module to Apache Extras does not make that dependency go away.  We
>> still need to understand the nature of the dependency: a build tool, a
>> dynamic runtime dependency, a statically linked library, an optional
>> extensions, a necessary core module.
>>
>> If we find out, for example, that something in ext-sources is only
>> used as a build tool, and is not part of the release, then there is
>> nothing that prevents us from hosting it in SVN.   But if something is
>> a necessary library and it is under GPL, then this is a problem even
>> if we store it on Apache-Extras,
>>
>>
>>>

 2) Continue the CWS integrations.  Along with 1) this ensures that all
 the code we need for the release is in SVN.

 3)  Files that Oracle include in their SGA need to have the Apache
 license header inserted and the Sun/Oracle copyright migrated to the
 NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
 automate parts of this.

 4) Once the SGA files have the Apache headers, then we can make
 regular use of RAT to report on files that are lacking an Apache
 header.  Such files might be in one of the following categories:

 a) Files that Oracle owns the copyright on and which should be
 included in an amended SGA

 b) Files that have a compatible OSS license which we are permitted to
 use.  This might require that we add a mention of it to the NOTICE
 file.

 c) Files that have an incompatible OSS license.  These need to be
 removed/replaced.

 d) Files that have an OSS license that has not yet been
 reviewed/categorized by Apache legal affairs.  In that case we need to
 bring it to their attention.

 e) (Hypothetically) files that are not under an OSS license at all.
 E.g., a Microsoft header file.  These must be removed.

 5) We should to track the resolution of each file, and do this
 publicly.  The audit trail is important.  Some ways we could do this
 might be:

 a) Track this in SVN properties.  So set ip:sga for the SGA files,
 ip:mit for files that are MIT licensed, etc.  This should be reflected
 in headers as well, but this is not always possible.  For example, we
 might have binary files where we cannot add headers, or cases where
 the OSS files do not have headers, but where we can prove their
 provenance via other means.

 b) Track this is a spreadsheet, one row per file.

 c) Track this is an text log file checked in SVN

 d) Track this in an annotated script that

Re: A systematic approach to IP review?

2011-09-19 Thread Marcus (OOo)

Am 09/19/2011 01:59 PM, schrieb Rob Weir:

2011/9/19 Jürgen Schmidt:

On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:


If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under "Copyright" and "Verify Distribution Rights".  It lists
the things we need to do, including:

  -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.



do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.




Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,






2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.



talked about it yes but did we reached a final deci

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
2011/9/19 Jürgen Schmidt :
> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:
>
>> If you haven't looked it closely, it is probably worth a few minutes
>> of your time to review our incubation status page, especially the
>> items under "Copyright" and "Verify Distribution Rights".  It lists
>> the things we need to do, including:
>>
>>  -- Check and make sure that the papers that transfer rights to the
>> ASF been received. It is only necessary to transfer rights for the
>> package, the core code, and any new code produced by the project.
>>
>> -- Check and make sure that the files that have been donated have been
>> updated to reflect the new ASF copyright.
>>
>> -- Check and make sure that for all code included with the
>> distribution that is not under the Apache license, we have the right
>> to combine with Apache-licensed code and redistribute.
>>
>> -- Check and make sure that all source code distributed by the project
>> is covered by one or more of the following approved licenses: Apache,
>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>> the same terms.
>>
>> Some of this is already going on, but it is hard to get a sense of who
>> is doing what and how much progress we have made.  I wonder if we can
>> agree to a more systematic approach?  This will make it easier to see
>> the progress we're making and it will also make it easier for others
>> to help.
>>
>> Suggestions:
>>
>> 1) We need to get all files needed for the build into SVN.  Right now
>> there are some that are copied down from the OpenOffice.org website
>> during the build's bootstrap process.   Until we get the files all in
>> one place it is hard to get a comprehensive view of our dependencies.
>>
>
> do you mean to check in the files under ext_source into svn and remove it
> later on when we have cleaned up the code. Or do you mean to put it
> somehwere on apache extras?
> I would prefer to save these binary files under apache extra if possible.
>


Why not just keep in in SVN?   Moving things to Apache-Extras does not
help us with the IP review.   In other words, if we have a dependency
on a OSS module that has an incompatible license, then moving that
module to Apache Extras does not make that dependency go away.  We
still need to understand the nature of the dependency: a build tool, a
dynamic runtime dependency, a statically linked library, an optional
extensions, a necessary core module.

If we find out, for example, that something in ext-sources is only
used as a build tool, and is not part of the release, then there is
nothing that prevents us from hosting it in SVN.   But if something is
a necessary library and it is under GPL, then this is a problem even
if we store it on Apache-Extras,


>
>>
>> 2) Continue the CWS integrations.  Along with 1) this ensures that all
>> the code we need for the release is in SVN.
>>
>> 3)  Files that Oracle include in their SGA need to have the Apache
>> license header inserted and the Sun/Oracle copyright migrated to the
>> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
>> automate parts of this.
>>
>> 4) Once the SGA files have the Apache headers, then we can make
>> regular use of RAT to report on files that are lacking an Apache
>> header.  Such files might be in one of the following categories:
>>
>> a) Files that Oracle owns the copyright on and which should be
>> included in an amended SGA
>>
>> b) Files that have a compatible OSS license which we are permitted to
>> use.  This might require that we add a mention of it to the NOTICE
>> file.
>>
>> c) Files that have an incompatible OSS license.  These need to be
>> removed/replaced.
>>
>> d) Files that have an OSS license that has not yet been
>> reviewed/categorized by Apache legal affairs.  In that case we need to
>> bring it to their attention.
>>
>> e) (Hypothetically) files that are not under an OSS license at all.
>> E.g., a Microsoft header file.  These must be removed.
>>
>> 5) We should to track the resolution of each file, and do this
>> publicly.  The audit trail is important.  Some ways we could do this
>> might be:
>>
>> a) Track this in SVN properties.  So set ip:sga for the SGA files,
>> ip:mit for files that are MIT licensed, etc.  This should be reflected
>> in headers as well, but this is not always possible.  For example, we
>> might have binary files where we cannot add headers, or cases where
>> the OSS files do not have headers, but where we can prove their
>> provenance via other means.
>>
>> b) Track this is a spreadsheet, one row per file.
>>
>> c) Track this is an text log file checked in SVN
>>
>> d) Track this in an annotated script that runs RAT, where the
>> annotations document the reason for cases where we tell it to ignore a
>> file or directory.
>>
>> 6) Iterate until we have a clean RAT report.
>>
>> 7) Goal should be for anyone today to be able to see what work remains
>> for IP clearance, as well as for someone 5 years from now to be able
>> to tell wha

Re: A systematic approach to IP review?

2011-09-19 Thread Rob Weir
On Sun, Sep 18, 2011 at 9:34 PM, Pedro Giffuni  wrote:
> Hi;
>
> Is there an updated SGA already?
>

Not that I know of.   But we can and should go ahead with IP clearance
using the SGA we already have.   In fact, starting that process will
help us identify exactly which files needed to be added to the updated
SGA.

-Rob


> I think there will likely be a set of files of uncertain license
> that we should move to apache-extras. I am refering specifically
> to the dictionaries: Oracle might have property over some but not
> all. I propose we rescue myspell in apache-extras and put the
> dictionaries there to keep it as an alternative. I have no idea
> where to get MySpell though.
>
> While here, if there's still interest in maintaining the Hg
> history, bitbucket.org seems to be a nice alternative: it's
> rather specialized in Mercurial.
>
> Cheers,
>
> Pedro.
>
> On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir  wrote:
>>
>> If you haven't looked it closely, it is probably worth a few minutes
>> of your time to review our incubation status page, especially the
>> items under "Copyright" and "Verify Distribution Rights".  It lists
>> the things we need to do, including:
>>
>>  -- Check and make sure that the papers that transfer rights to the
>> ASF been received. It is only necessary to transfer rights for the
>> package, the core code, and any new code produced by the project.
>>
>> -- Check and make sure that the files that have been donated have been
>> updated to reflect the new ASF copyright.
>>
>> -- Check and make sure that for all code included with the
>> distribution that is not under the Apache license, we have the right
>> to combine with Apache-licensed code and redistribute.
>>
>> -- Check and make sure that all source code distributed by the project
>> is covered by one or more of the following approved licenses: Apache,
>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>> the same terms.
>>
>> Some of this is already going on, but it is hard to get a sense of who
>> is doing what and how much progress we have made.  I wonder if we can
>> agree to a more systematic approach?  This will make it easier to see
>> the progress we're making and it will also make it easier for others
>> to help.
>>
>> Suggestions:
>>
>> 1) We need to get all files needed for the build into SVN.  Right now
>> there are some that are copied down from the OpenOffice.org website
>> during the build's bootstrap process.   Until we get the files all in
>> one place it is hard to get a comprehensive view of our dependencies.
>>
>> 2) Continue the CWS integrations.  Along with 1) this ensures that all
>> the code we need for the release is in SVN.
>>
>> 3)  Files that Oracle include in their SGA need to have the Apache
>> license header inserted and the Sun/Oracle copyright migrated to the
>> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
>> automate parts of this.
>>
>> 4) Once the SGA files have the Apache headers, then we can make
>> regular use of RAT to report on files that are lacking an Apache
>> header.  Such files might be in one of the following categories:
>>
>> a) Files that Oracle owns the copyright on and which should be
>> included in an amended SGA
>>
>> b) Files that have a compatible OSS license which we are permitted to
>> use.  This might require that we add a mention of it to the NOTICE
>> file.
>>
>> c) Files that have an incompatible OSS license.  These need to be
>> removed/replaced.
>>
>> d) Files that have an OSS license that has not yet been
>> reviewed/categorized by Apache legal affairs.  In that case we need to
>> bring it to their attention.
>>
>> e) (Hypothetically) files that are not under an OSS license at all.
>> E.g., a Microsoft header file.  These must be removed.
>>
>> 5) We should to track the resolution of each file, and do this
>> publicly.  The audit trail is important.  Some ways we could do this
>> might be:
>>
>> a) Track this in SVN properties.  So set ip:sga for the SGA files,
>> ip:mit for files that are MIT licensed, etc.  This should be reflected
>> in headers as well, but this is not always possible.  For example, we
>> might have binary files where we cannot add headers, or cases where
>> the OSS files do not have headers, but where we can prove their
>> provenance via other means.
>>
>> b) Track this is a spreadsheet, one row per file.
>>
>> c) Track this is an text log file checked in SVN
>>
>> d) Track this in an annotated script that runs RAT, where the
>> annotations document the reason for cases where we tell it to ignore a
>> file or directory.
>>
>> 6) Iterate until we have a clean RAT report.
>>
>> 7) Goal should be for anyone today to be able to see what work remains
>> for IP clearance, as well as for someone 5 years from now to be able
>> to tell what we did.  Tracking this on the community wiki is probably
>> not good enough, since we've previously talked about dropping that
>> wiki and going to MWiki.
>>
>>
>> -Rob
>>
>

Re: A systematic approach to IP review?

2011-09-19 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir  wrote:

> If you haven't looked it closely, it is probably worth a few minutes
> of your time to review our incubation status page, especially the
> items under "Copyright" and "Verify Distribution Rights".  It lists
> the things we need to do, including:
>
>  -- Check and make sure that the papers that transfer rights to the
> ASF been received. It is only necessary to transfer rights for the
> package, the core code, and any new code produced by the project.
>
> -- Check and make sure that the files that have been donated have been
> updated to reflect the new ASF copyright.
>
> -- Check and make sure that for all code included with the
> distribution that is not under the Apache license, we have the right
> to combine with Apache-licensed code and redistribute.
>
> -- Check and make sure that all source code distributed by the project
> is covered by one or more of the following approved licenses: Apache,
> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
> the same terms.
>
> Some of this is already going on, but it is hard to get a sense of who
> is doing what and how much progress we have made.  I wonder if we can
> agree to a more systematic approach?  This will make it easier to see
> the progress we're making and it will also make it easier for others
> to help.
>
> Suggestions:
>
> 1) We need to get all files needed for the build into SVN.  Right now
> there are some that are copied down from the OpenOffice.org website
> during the build's bootstrap process.   Until we get the files all in
> one place it is hard to get a comprehensive view of our dependencies.
>

do you mean to check in the files under ext_source into svn and remove it
later on when we have cleaned up the code. Or do you mean to put it
somehwere on apache extras?
I would prefer to save these binary files under apache extra if possible.


>
> 2) Continue the CWS integrations.  Along with 1) this ensures that all
> the code we need for the release is in SVN.
>
> 3)  Files that Oracle include in their SGA need to have the Apache
> license header inserted and the Sun/Oracle copyright migrated to the
> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
> automate parts of this.
>
> 4) Once the SGA files have the Apache headers, then we can make
> regular use of RAT to report on files that are lacking an Apache
> header.  Such files might be in one of the following categories:
>
> a) Files that Oracle owns the copyright on and which should be
> included in an amended SGA
>
> b) Files that have a compatible OSS license which we are permitted to
> use.  This might require that we add a mention of it to the NOTICE
> file.
>
> c) Files that have an incompatible OSS license.  These need to be
> removed/replaced.
>
> d) Files that have an OSS license that has not yet been
> reviewed/categorized by Apache legal affairs.  In that case we need to
> bring it to their attention.
>
> e) (Hypothetically) files that are not under an OSS license at all.
> E.g., a Microsoft header file.  These must be removed.
>
> 5) We should to track the resolution of each file, and do this
> publicly.  The audit trail is important.  Some ways we could do this
> might be:
>
> a) Track this in SVN properties.  So set ip:sga for the SGA files,
> ip:mit for files that are MIT licensed, etc.  This should be reflected
> in headers as well, but this is not always possible.  For example, we
> might have binary files where we cannot add headers, or cases where
> the OSS files do not have headers, but where we can prove their
> provenance via other means.
>
> b) Track this is a spreadsheet, one row per file.
>
> c) Track this is an text log file checked in SVN
>
> d) Track this in an annotated script that runs RAT, where the
> annotations document the reason for cases where we tell it to ignore a
> file or directory.
>
> 6) Iterate until we have a clean RAT report.
>
> 7) Goal should be for anyone today to be able to see what work remains
> for IP clearance, as well as for someone 5 years from now to be able
> to tell what we did.  Tracking this on the community wiki is probably
> not good enough, since we've previously talked about dropping that
> wiki and going to MWiki.
>

talked about it yes but did we reached a final decision?

The migrated wiki is available under http://ooo-wiki.apache.org/wiki and can
be used. Do we want to continue with this wiki now? It's still not clear for
me at the moment.

But we need a place to document the IP clearance and under
http://ooo-wiki.apache.org/wiki/ApacheMigration we have already some
information.

Juergen


>
>
> -Rob
>
>
> [1] http://incubator.apache.org/projects/openofficeorg.html
>
> [2] http://incubator.apache.org/rat/
>


Re: A systematic approach to IP review?

2011-09-19 Thread Jürgen Schmidt
On Mon, Sep 19, 2011 at 3:34 AM, Pedro Giffuni  wrote:

> Hi;
>
> Is there an updated SGA already?
>

good question and where can we find it?

Juergen


>
> I think there will likely be a set of files of uncertain license
> that we should move to apache-extras. I am refering specifically
> to the dictionaries: Oracle might have property over some but not
> all. I propose we rescue myspell in apache-extras and put the
> dictionaries there to keep it as an alternative. I have no idea
> where to get MySpell though.
>
> While here, if there's still interest in maintaining the Hg
> history, bitbucket.org seems to be a nice alternative: it's
> rather specialized in Mercurial.
>
> Cheers,
>
> Pedro.
>
>
> On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir  wrote:
>
>> If you haven't looked it closely, it is probably worth a few minutes
>> of your time to review our incubation status page, especially the
>> items under "Copyright" and "Verify Distribution Rights".  It lists
>> the things we need to do, including:
>>
>>  -- Check and make sure that the papers that transfer rights to the
>> ASF been received. It is only necessary to transfer rights for the
>> package, the core code, and any new code produced by the project.
>>
>> -- Check and make sure that the files that have been donated have been
>> updated to reflect the new ASF copyright.
>>
>> -- Check and make sure that for all code included with the
>> distribution that is not under the Apache license, we have the right
>> to combine with Apache-licensed code and redistribute.
>>
>> -- Check and make sure that all source code distributed by the project
>> is covered by one or more of the following approved licenses: Apache,
>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>> the same terms.
>>
>> Some of this is already going on, but it is hard to get a sense of who
>> is doing what and how much progress we have made.  I wonder if we can
>> agree to a more systematic approach?  This will make it easier to see
>> the progress we're making and it will also make it easier for others
>> to help.
>>
>> Suggestions:
>>
>> 1) We need to get all files needed for the build into SVN.  Right now
>> there are some that are copied down from the OpenOffice.org website
>> during the build's bootstrap process.   Until we get the files all in
>> one place it is hard to get a comprehensive view of our dependencies.
>>
>> 2) Continue the CWS integrations.  Along with 1) this ensures that all
>> the code we need for the release is in SVN.
>>
>> 3)  Files that Oracle include in their SGA need to have the Apache
>> license header inserted and the Sun/Oracle copyright migrated to the
>> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
>> automate parts of this.
>>
>> 4) Once the SGA files have the Apache headers, then we can make
>> regular use of RAT to report on files that are lacking an Apache
>> header.  Such files might be in one of the following categories:
>>
>> a) Files that Oracle owns the copyright on and which should be
>> included in an amended SGA
>>
>> b) Files that have a compatible OSS license which we are permitted to
>> use.  This might require that we add a mention of it to the NOTICE
>> file.
>>
>> c) Files that have an incompatible OSS license.  These need to be
>> removed/replaced.
>>
>> d) Files that have an OSS license that has not yet been
>> reviewed/categorized by Apache legal affairs.  In that case we need to
>> bring it to their attention.
>>
>> e) (Hypothetically) files that are not under an OSS license at all.
>> E.g., a Microsoft header file.  These must be removed.
>>
>> 5) We should to track the resolution of each file, and do this
>> publicly.  The audit trail is important.  Some ways we could do this
>> might be:
>>
>> a) Track this in SVN properties.  So set ip:sga for the SGA files,
>> ip:mit for files that are MIT licensed, etc.  This should be reflected
>> in headers as well, but this is not always possible.  For example, we
>> might have binary files where we cannot add headers, or cases where
>> the OSS files do not have headers, but where we can prove their
>> provenance via other means.
>>
>> b) Track this is a spreadsheet, one row per file.
>>
>> c) Track this is an text log file checked in SVN
>>
>> d) Track this in an annotated script that runs RAT, where the
>> annotations document the reason for cases where we tell it to ignore a
>> file or directory.
>>
>> 6) Iterate until we have a clean RAT report.
>>
>> 7) Goal should be for anyone today to be able to see what work remains
>> for IP clearance, as well as for someone 5 years from now to be able
>> to tell what we did.  Tracking this on the community wiki is probably
>> not good enough, since we've previously talked about dropping that
>> wiki and going to MWiki.
>>
>>
>> -Rob
>>
>>
>> [1] 
>> http://incubator.apache.org/**projects/openofficeorg.html
>>
>> [2] http://incubator.apache.org/**r

Re: A systematic approach to IP review?

2011-09-18 Thread Pedro Giffuni

Hi;

Is there an updated SGA already?

I think there will likely be a set of files of uncertain license
that we should move to apache-extras. I am refering specifically
to the dictionaries: Oracle might have property over some but not
all. I propose we rescue myspell in apache-extras and put the
dictionaries there to keep it as an alternative. I have no idea
where to get MySpell though.

While here, if there's still interest in maintaining the Hg
history, bitbucket.org seems to be a nice alternative: it's
rather specialized in Mercurial.

Cheers,

Pedro.

On Sun, 18 Sep 2011 20:27:05 -0400, Rob Weir  
wrote:

If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under "Copyright" and "Verify Distribution Rights".  It lists
the things we need to do, including:

 -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have 
been

updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the 
project

is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of 
who

is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.

2) Continue the CWS integrations.  Along with 1) this ensures that 
all

the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need 
to

bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be 
reflected

in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore 
a

file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work 
remains

for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.


-Rob


[1] http://incubator.apache.org/projects/openofficeorg.html

[2] http://incubator.apache.org/rat/




RE: A systematic approach to IP review?

2011-09-18 Thread Dennis E. Hamilton
+1

-Original Message-
From: Rob Weir [mailto:robw...@apache.org] 
Sent: Sunday, September 18, 2011 17:27
To: ooo-dev@incubator.apache.org
Subject: A systematic approach to IP review?

If you haven't looked it closely, it is probably worth a few minutes
of your time to review our incubation status page, especially the
items under "Copyright" and "Verify Distribution Rights".  It lists
the things we need to do, including:

 -- Check and make sure that the papers that transfer rights to the
ASF been received. It is only necessary to transfer rights for the
package, the core code, and any new code produced by the project.

-- Check and make sure that the files that have been donated have been
updated to reflect the new ASF copyright.

-- Check and make sure that for all code included with the
distribution that is not under the Apache license, we have the right
to combine with Apache-licensed code and redistribute.

-- Check and make sure that all source code distributed by the project
is covered by one or more of the following approved licenses: Apache,
BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
the same terms.

Some of this is already going on, but it is hard to get a sense of who
is doing what and how much progress we have made.  I wonder if we can
agree to a more systematic approach?  This will make it easier to see
the progress we're making and it will also make it easier for others
to help.

Suggestions:

1) We need to get all files needed for the build into SVN.  Right now
there are some that are copied down from the OpenOffice.org website
during the build's bootstrap process.   Until we get the files all in
one place it is hard to get a comprehensive view of our dependencies.

2) Continue the CWS integrations.  Along with 1) this ensures that all
the code we need for the release is in SVN.

3)  Files that Oracle include in their SGA need to have the Apache
license header inserted and the Sun/Oracle copyright migrated to the
NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
automate parts of this.

4) Once the SGA files have the Apache headers, then we can make
regular use of RAT to report on files that are lacking an Apache
header.  Such files might be in one of the following categories:

a) Files that Oracle owns the copyright on and which should be
included in an amended SGA

b) Files that have a compatible OSS license which we are permitted to
use.  This might require that we add a mention of it to the NOTICE
file.

c) Files that have an incompatible OSS license.  These need to be
removed/replaced.

d) Files that have an OSS license that has not yet been
reviewed/categorized by Apache legal affairs.  In that case we need to
bring it to their attention.

e) (Hypothetically) files that are not under an OSS license at all.
E.g., a Microsoft header file.  These must be removed.

5) We should to track the resolution of each file, and do this
publicly.  The audit trail is important.  Some ways we could do this
might be:

a) Track this in SVN properties.  So set ip:sga for the SGA files,
ip:mit for files that are MIT licensed, etc.  This should be reflected
in headers as well, but this is not always possible.  For example, we
might have binary files where we cannot add headers, or cases where
the OSS files do not have headers, but where we can prove their
provenance via other means.

b) Track this is a spreadsheet, one row per file.

c) Track this is an text log file checked in SVN

d) Track this in an annotated script that runs RAT, where the
annotations document the reason for cases where we tell it to ignore a
file or directory.

6) Iterate until we have a clean RAT report.

7) Goal should be for anyone today to be able to see what work remains
for IP clearance, as well as for someone 5 years from now to be able
to tell what we did.  Tracking this on the community wiki is probably
not good enough, since we've previously talked about dropping that
wiki and going to MWiki.


-Rob


[1] http://incubator.apache.org/projects/openofficeorg.html

[2] http://incubator.apache.org/rat/