Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
I am developing a very CPU-intensive, open-source error-correcting code.

The intention of this code is that you can split a large (> 5GB)
file across multiple packets. Whenever you receive enough packets that
their combined size = the file size, you can decode the packets to
recover the file, regardless of which packets you get.

This means a lot of calculation over gigabytes worth of data.
Therefore, speed of utmost importance in this application.

The project itself includes an ocaml compiler (derived from fftw) which
generates C code to perform various types of Fast Galois Transforms. Some
of the output C code uses SSE2 exclusively. This C code is then compiled 
and linked in with the other C sources that make up the application.

Now, on to the dilemma: icc produces object files which run ~2* faster
than the object files produced by gcc when SSE2 is used. (The non-SSE2
versions are also faster, but so significantly) Both gcc and icc can 
compile the generated C files. My University will shortly own a licence
for icc which allows us to distribute binaries.

So, when it comes time to release this and include it in a .deb, I ask
myself: what would happen if I included (with the C source and ocaml
compiler) some precompiled object files for i386? As long as the build
target is i386, these object files could be linked in instead of using
gcc to produce (slower) object files. This would mean a 2* speedup for
users, which is vital in order to reach line-speed. Other platforms 
recompile as normal.

On the other hand, is this still open source?
Is this allowed by policy?
Can this go into main?

Some complaints and my answers below:

C: How do we know the object files aren't trojaned? 
A: Because I am both the upstream developer and (will be) the debian 
   maintainer, and I say they aren't.

C: You can't recompile the application without ICC, which is not free.
A: You can still rebuild it with gcc.

C: But you can't rebuild _exactly_ the same binary.
A: This is essentially *my* question: is this required by policy/gpl?
   Remember, you can always get ICC yourself. If there is a GPL problem, 
   then I think no MSVC application can be GPL either.

C: You're just too lazy to hand-optimize the assembler and include that.
A: You're right. Some of those auto-generated C files are > 64k of
   completely incomprehensible math. 
   I could include .S files instead of .o files, though, if that helps.

C: You're just too lazy to fix gcc.
A: I also wouldn't know where to begin, and I already file bugs.
   Even if I did know where to begin, gcc is not my responsibility.

C: A (security) bugfix won't get linked in.
A: A bug in the auto-generated C code is unlikely, and if they was one,
   changing the .c file makes it newer than the .o, which means gcc will
   rebuild it.

That's it!
What are the thoughts of GPL and policy experts?

PS. I will provide the source code to anyone who requests it, but not yet
under the GPL. Only after I publish a paper about the algorithm will the 
code be released under the GPL.

-- 
Wesley W. Terpstra <[EMAIL PROTECTED]>




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
Since there's one GPL question left, I am still posting to debian-legal.
The legal question is marked ** for those who want to skip the rest.

On Mon, Oct 18, 2004 at 11:49:56AM -0700, Josh Triplett wrote:
> Whether your university owns a license or not does not really affect
> Debian.  icc cannot be included in Debian main.

No, but debian can distribute precompiled object files (legally).
The binaries I meant were the object files.

> Keep in mind that if your algorithm is as good as it sounds, it will be
> around for a long time.  Even if a GCC-compiled version can't achieve
> line-speed right now, if all it needs is a 2x speedup, normal increases
> in computer technology will provide that soon enough.

True enough, but as processors get faster, so does bandwidth.
I expect that ultimately, it will always need to be as fast as possible.

> Consider this: any package with non-free Build-Depends that aren't
> strictly required at runtime could take this approach, by shipping
> precompiled files.  For example, this has come up several times with
> Java packages that tried to just ship a (Sun/Blackdown-compiled) .jar
> file in the source package.  The answer here is the same: you can't ship
> compiled files to avoid having a non-free build-depends (and shouldn't
> ship compiled files at all, even if they were compiled with a Free
> compiler); the package should always be built from source.

That is a good argument; thank you.

> * Upload a package to main which builds using GCC.  (As a side note, you
> might check to see if GCC 3.4/3.5 produces significantly better code.)

gcc-3.3 is not an issue; it ICEs.
gcc-3.4.2 is the version I was referring to.

> * Make it easy for people to rebuild using icc.  See the openoffice.org
> packages for an example; they contain support for rebuilding using a
> non-free JDK based on a flag in DEB_BUILD_OPTIONS.

That's a good idea.

> * Supply icc-built packages either on your people.debian.org site or in
> contrib; if the latter, you need to use a different package name and
> conflict with the gcc-built package in main.

Josselin Mouette <[EMAIL PROTECTED]> said:
> If you really want to distribute a package built with icc, you should
> make a separate package in the contrib section, and have it conflict
> with the package in main.

Yes, this sounds like a good plan.

Put the normal gcc version rsgt in main where the i386 deb has:
Recommends: rsgt-icc

rsgt-icc sits in contrib, completely built by icc (not just some .o s)
Conflicts: rsgt
Provides: rsgt
Replaces: rsgt

If an i386 user (with contrib sourced) runs 'apt-get install rsgt'
will that make apt install rsgt-icc? That's what I hope to accomplish.

(PS. rsgt is not the final name)

**
For it to sit in contrib, would I have to include the source code in contrib
as well? Or would the fact that the source code was in main already satisfy
the GPL requirement of source availability?

Clearly, it could still sit in non-free without the source, but contrib is
more accurate imo. If there's no reason to include 2* the source, I see no
reason to present 2* the load to the ftp-servers.

> it is acceptable *under the GPL* to provide binaries compiled with
> non-free compilers, unless the resulting compiled binary is somehow
> derivative of a non-free work that is not an OS component. In the end, if
> people want to exercise their rights under the GPL, they will want the
> source, not the binaries, and you are supplying that source alongside the
> binaries, which satisfies the GPL.

Then I suppose it makes sense to just supply a precompiled version (with
icc) and a source tarball as upstream. The debian version would work as
covered already above.

> > PS. I will provide the source code to anyone who requests it, but not yet
> > under the GPL. Only after I publish a paper about the algorithm will the 
> > code be released under the GPL.
> 
> Keep in mind that FFTW is GPLed, so unless you have made other
> arrangements with its copyright holders, you need to refrain from
> supplying the code or binaries to anyone unless under the GPL.

Oh, that's a good point. 
I withdrawl my offer of private pre-release.
You can only have a copy after I publish. ;)

Thank you for your detailed explanation and answer.

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
On Mon, Oct 18, 2004 at 01:33:07PM -0500, John Hasler wrote:
> Josselin Mouette writes:
> > Main must be built with only packages from main.
> 
> Packages in main must be _buildable_ with only packages from main.

Interesting.

This slight difference in wording sounds to me like I would indeed be able
to include prebuilt object files, so long as the package could be built
without them. Is that correct?

The actual text in policy is:
* must not require a package outside of main for compilation or execution
(thus, the package must not declare a "Depends", "Recommends", or
"Build-Depends" relationship on a non-main package)

This wording appears to back up what you say (John).
The clause 'must not require' is fine with my case. Since the source files
can be rebuilt with gcc, icc is not required. Execution is a non-issue.

At this point my question is only academic; the pure-gcc in main,
icc-prebuilt in contrib solution seems to solve my concerns just as well.

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
On Mon, Oct 18, 2004 at 05:15:20PM -0400, Glenn Maynard wrote:
> Isn't this just what PAR and PAR2 do (in conjunction with a file splitter)?

Thanks for the pointer to this project; I didn't know about it.
However, to answer your question: no.

PAR2 uses Reed-Solomon codes, my project also uses a variant of
Reed-Solomon. However, the result is completely different.

PAR2 aims to correct corruption of data (as well as loss).
Most differently, it also trades disk overhead for speed.

My program does not attempt to correct corruption (though there are
algorithms which can do so, just not efficiently). If corruption 
resilience is needed, you would have to add a CRC to each packet.

[ For those who know how Reed-Solomon works, I am applying Reed-Solomon 
over the entire file as one block -- using an alphabet of size ~2^62. 
Normally Reed-Solomon breaks the file into blocks of a fixed size and 
adds the correction these small blocks. ]

For example, with par2, I have 
-rw-r--r--  1 terpstra terpstra 627103 2004-10-19 00:18 foo.pdf

I ran: par2 create foo.pdf
It generated:
-rw-r--r--  1 terpstra terpstra  40600 2004-10-19 00:19 foo.pdf.par2
-rw-r--r--  1 terpstra terpstra  40980 2004-10-19 00:19 foo.pdf.vol000+01.par2
-rw-r--r--  1 terpstra terpstra  81860 2004-10-19 00:19 foo.pdf.vol001+02.par2
-rw-r--r--  1 terpstra terpstra 123120 2004-10-19 00:19 foo.pdf.vol003+04.par2
-rw-r--r--  1 terpstra terpstra 165140 2004-10-19 00:19 foo.pdf.vol007+08.par2
-rw-r--r--  1 terpstra terpstra 208680 2004-10-19 00:19 foo.pdf.vol015+16.par2
-rw-r--r--  1 terpstra terpstra 255260 2004-10-19 00:19 foo.pdf.vol031+32.par2
-rw-r--r--  1 terpstra terpstra 257540 2004-10-19 00:19 foo.pdf.vol063+38.par2

I then removed foo.pdf and tried: par2 recover foo.pdf.par2
...
You have 0 out of 2010 data blocks available.
You have 101 recovery blocks available.
Repair is not possible.
You need 1909 more recovery blocks to be able to repair.

To which, I say, wtf!
The data I have totals
du -sb .
1173540 .

That's even more than foo.pdf!
[EMAIL PROTECTED]:~/x/y$ du -bs ../foo.pdf
627103  foo.pdf

... and yet par2 tells me I need 1909 more (I have 101).
That means I would need 18* more information in order to recover foo.pdf!
I have to admit, I am surprised by this. I would have expected a better
ratio, even with very old techniques.

My program is entirely different.
Let's say it's called rsgt (I haven't decided on a name yet).

You run: rsgt 1024  foo.pdf
You then get  files all of size 1024 (regardless of the size of foo.pdf).

As long as you have enough files that the total size is the same or larger
than the size of foo.pdf, you can recover foo.pdf.

You can create nearly 2^62 different such files, and start generating them
from any starting point. This means you could essentially send a never
ending stream of (network sized) packets which people could 'tap into'.
After getting any subsequence of packets, as long as you get enough so that
their total is the file size, you are done.

This what is different about my program: it has zero space overhead (well,
it has an 8 byte per packet header, but it doesn't depend on packet length).
What is new in terms of research is that I have an algorithm which can do
the decoding quickly.

rsgt aims not to add 'parity' as I think 'par'2 is intended to suggest.
Rather, it transforms a file into a stream of user-defined-size packets
which goes on practically forever. ANY of the packets will do to get back
the original file, as long as the sum is the same size.

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
On Mon, Oct 18, 2004 at 07:45:39PM -0400, Glenn Maynard wrote:
> On Tue, Oct 19, 2004 at 12:59:42AM +0200, Wesley W. Terpstra wrote:
> > To which, I say, wtf!
> You're using it wrong.

Well thank goodness, b/c otherwise that would be really awful. :)
This gives me a great source to compare my algorithm's speed against.
Thank you again for showing me it, and straighting out my misuse.

> [instructions which I followed and worked]
> This is exactly what you describe.

Yep! That's what Reed-Solomon codes can do.
I never claimed to invent them; as I said earlier, my work was on creating
an algorithm which could decode >5GB of data broken into packets (which for
me means small packets of network size).

I suggest you try:
dd if=/dev/urandom of=testing bs=16 count=1048576
split -a 3 -b 1024 testing testing.part.
find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par 

... now, twiddle you thumbs.
After about a minute, you will see it process the first file
Another 30s will get you the second.
You have 16382 to go.

If you strace it, you will see there are no system calls being performed at
this time, so the slowness is not due to the large directory. Also, the
processor is fully loaded.

This is only for _encoding_ which is much easier than decoding.
You also are only doing it over 16MB. 
My algorithm can handle 3 orders of magnitude more data much faster.

I would wager that par is using the Berklekamp-Masey algorithm for decoding; 
this is the most popular algorithm for RS codes at the moment. 
This algorithm has time O(n^2) for n 'parts'.
My algorithm has time O(nlogn) and a not too terrible constant.

Perhaps I should make my program 'par' command-line compatible!
OTOH, when you have so many small files it is not convenient.

Thank you very very much for bringing this implementation to my attention.

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
On Tue, Oct 19, 2004 at 02:49:09AM +0200, Wesley W. Terpstra wrote:
> I would wager that par is using the Berklekamp-Masey algorithm for decoding; 
That would be Berlekamp-Massey. Appologies to both.
I should add their names to my spell checker. =)

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-18 Thread Wesley W. Terpstra
On Tue, Oct 19, 2004 at 02:49:09AM +0200, Wesley W. Terpstra wrote:
> find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par 

After taking a look in the source code for par, I found this in rs.c:
|*| Calculations over a Galois Field, GF(8)

What does that mean? It means there are only 2^8 possible values to evaluate
the polynomial at. So, in fact, the above command will not even work, should
it terminate. However, this is not a problem with RS codes, just par.

I was also wrong about them using Berlekamp-Massey.
They use Gaussian elimination to compute the inverse.
This is complexity O(R^2N) which is even worse (see rs.c:214).
R=number of input files/packets
N=total number of files that were in the output (so N>=R)[B
ie:N>=R=n -> O(n^3) vs. O(n^2) for Berlekamp-Massey or O(nlogn) for mine.

OTOH, since they have at most 2^8 possible 'packets' (including the source), 
that keeps the complexity from killing them. --> 'only' 2^24 operations =)
Also on the plus side, this is the complexity per number of packets, 
not the complexity of the data to be processed.

Although I haven't checked, I would speculate from the simplicity of the
code that they use the normal matrix product algorithm, which means (at
best) O(LR) where L is the total length of the data, R is the number of
files.

As I mentioned, my code has an alphabet around 2^62.
Actually, it's (2^31-1)^2-1 .. but that's almost the same.
Handling potentially so many more packets means you need a new algorithm.

Still, par is very cool and I will liberally lift usability from it.
... and, of course, use it for unfair comparisons in my paper. =)

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-19 Thread Wesley W. Terpstra
I know this thread has progressed beyond the actual situation 
I asked about, but I wanted to just throw in my opinion too.

On Tue, Oct 19, 2004 at 09:13:24AM +0200, Andreas Barth wrote:
> A program is IMHO not only specified by the fact that it does certain
> transformations from input to output, but also by the speed it does
> this. If this specification can be matched by gcc, why consider using
> icc at all? And if not, it requires icc.

This is now also my point of view.

When I started this thread, I also _felt_ that contrib was the correct 
place for my application, but didn't really know why. Now I can explain 
it better. The proposal of keeping one version in main and one in contrib
also addresses my concern about usability.

So, I am happy with the outcome of this discussion already. =)

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-19 Thread Wesley W. Terpstra
On Mon, Oct 18, 2004 at 11:05:16PM -0400, Glenn Maynard wrote:
> > find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 
> > testing.par 
> 
> You're splitting into parts which are far too small.

Yes, it's too small for par.
It's also clear that it's too small for usenet use.

However, my program is not intended to solve usenet problems.
My applications are all on the level of packet switching networks.

Besides, who ever complains when something is faster than they need?

Or as my first computer science prof once said:
  Insertion sort is fine for most tasks.
  Sometimes you need quicksort.

> It's not designed for thousands of tiny parts

No, but mine is.

> Most PAR operations are IO-bound (judging by drive thrashing, not from
> actual benchmarks).

Not to be rude, but you're mistaken here.
strace it when there are many small files; it is not doing syscalls.
Disk IO and/or thrashing is not the issue for small files.

Maybe disk thrashing is a problem during normal par operation, but it is a
minor problem compared to the computation (for my goals).
[ As aside, my algorithm is also streaming; it reads the 'file' in sequence
three times, so disk thrashing should not be a problem. ]

> I don't really understand the use of allowing thousands of tiny parts.
> What's the intended end use?

Note that PAR cannot help you if the unit of failure is very small.
Even one missing piece of a 'part' makes that 'part' useless.

Florian already mentioned multicast, and that is my first application. 

Another situation is one where you have any one-way network link 
(some crazy firewalls [my work; arg!!]).

Future (?) wireless networks might have base station with a larger range
than the clients. Clients could still download (without ACKs) in this case.

Perhaps your ISP has packet loss that sometimes sits at 20% (my home; arg!).

If you know how TCP works, you will also know that it will nearly
stop sending because it thinks the network is congested, even though
the real problem is a faulty switch which drops 20% of packets seemingly
at random. Using my code over UDP completely removes this problem.

(However, this is dangerous because my code is also 'unfair' in the sense
that it will stop all TCP traffic b/c it will not care about packet loss 
due to conjestion while the TCP traffic will back off)

You might also use it to make a version of bittorrent where each packet is
independent of the others. This would help prevent 'dead' torrents where
there is no seed and all downloads stall b/c the known information overlaps.

Another case might be mobile agents where PDAs exchange parts of files they
are looking for whenever they run into other PDAs they can bargain with
(like bittorrent).

However, PDAs move when their owners move, so network sessions are
interuppted at random times, and one PDA may never see the other ever 
again.

This scheme would let a PDA broadcast a file to all nearby PDAs which
could make use of the information regardless of when they leave (mid
'part'?) or whether they have already pieces of the file.

Another situation I would like to apply my code to is to sensor networks
where there is a stream of measurements of some variable. My code can
not presently handle this correctly, but that is future work for me.

I am not a very imaginative person; I am sure there are many other
situations where this could be applied.

From another point of view, research doesn't *need* to be practical. ;)

If other people have ideas, I'd like to hear them.

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-20 Thread Wesley W. Terpstra
hich must be distributed under the terms of Sections
> > 1 and 2 above on a medium customarily used for software interchange; or,
> [snip 3b and 3c]
> [snip OS exception]
> > 
> > If distribution of executable or object code is made by offering
> > access to copy from a designated place, then offering equivalent
> > access to copy the source code from the same place counts as
> > distribution of the source code, even though third parties are not
> > compelled to copy the source along with the object code.
> 
> So the question is whether a source package in main "accompanies" a
> binary package in contrib, and/or whether "equivalent access" is
> offered.  This is certainly questionable.  It would also depend on
> mirroring; for example, if contrib were ever moved to a different server
> (which has been debated in the past), this would become clearly false.
> 
> My advice would be this: unless the source is incredibly huge (such as
> with OO.o), then I don't think saving a few tens of MB is worth dealing
> with the questions and complexities this raises.

That's my conclusion too.

Thanks again!

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-20 Thread Wesley W. Terpstra
On Wed, Oct 20, 2004 at 02:06:46PM +0200, Wesley W. Terpstra wrote:
> On Tue, Oct 19, 2004 at 04:59:37PM -0700, Josh Triplett wrote:
> > Possibly; however, I think bandwidth grows far slower than CPU speed and
> > overall system power.  I do understand your concern, though.
>
> I intend to find out his source for these slides b/c this is very
> important to know.

Apparently this data comes from Jim Gray
http://research.microsoft.com/~Gray/
who works for Microsoft Research.

The information is current as of 99/00:
http://research.microsoft.com/~gray/papers/MS_TR_99_100_Rules_of_Thumb_in_Data_Engineering.pdf

I would love to see more recent data if anyone has it.

-- 
Wesley W. Terpstra




Re: Reproducible, precompiled .o files: what say policy+gpl?

2004-10-23 Thread Wesley W. Terpstra
On Sat, Oct 23, 2004 at 12:27:25PM -0600, Gunnar Wolf wrote:
> Wesley W. Terpstra dijo [Mon, Oct 18, 2004 at 09:59:36PM +0200]:
> > At this point my question is only academic; the pure-gcc in main,
> > icc-prebuilt in contrib solution seems to solve my concerns just as well.
> 
> I have only one concern with this: What happens if you drop the
> package and someone else takes it? He will no longer be able to
> compile it with icc, and the icc-prebuilt users will be left out in
> the cold. What would you say to that?

He can upload a version to contrib which depends on the version in main and
has no contents. Then the icc users are automatically converted to gcc.

Or else, if he is an open-source developer who makes no money from his
debian work, he can download icc from their site for free.
Just universities and paid researchers like me have to pay. Sniff.

-- 
Wesley W. Terpstra




GPL and command-line libraries

2004-11-02 Thread Wesley W. Terpstra
Good evening!

I'm developing an error-correcting code library which works on a lot of data
at once. Since the API is quite simple and the cost of process creation
relatively insignificant, I would like to provide a command-line API.

I feel this has several engineering advantages:
1) it's easier to debug and understand if you can interact with it
2) bugs in the library don't crash programs using it
3) multithreading can be used in the library and not the application
... and other more problem-specific benefits.

To the point: I want this library to be released under the GPL, BUT ...
Would the GPL still protect it?

If someone writes a program that does: popen('my-api');
does the GPL require that program to also be GPL?
From the short answer I got on IRC it seemed the answer was: No!

What I am concerned about is the following scenario:

Mr. John Wontshare writes a streaming multicast client.
To deal with packet loss, he uses my error-correcting library.
Without my library, Mr. Wontshare's client can't work at all.
Mr. Wontshare's client represents only a small investment of effort and
without having had access to my library, he could have never written it.
He then distributes his client along with my library to end-users.

These users don't get Mr. Wontshare's code, even though he uses my library.
Even worse, he refuses to port his client to MacOS X for business reasons.
(intentionally giving an unfair competitive advantage to another platform)

To me anyways, this sounds like exactly the situation the GPL is supposed to
protect against. Is this _not_ a derivative work?

If that's really the case, is it possible that a GPLv3 might address this?

There are several things I've considered to prevent this scenario:

1. Write in all documentation, help, etc: "popen my app = derivative work"
   ... and hope that this is enough to give me a victory in a lawsuit or at
   least scare Mr. Wontshare away from even trying this.
2. Patenting the new algorithm my library uses and putting in a clause which
   covers this corner-case and making it otherwise free.
3. Crafting a special (GPL-incompatible) licence which does what I want.
4. Writing to debian-legal and asking for advice.

I've heard all sorts of arguements in IRC that drawing the line in a good
way is very hard. I believe that. However, what I want to know is, if this
went to court, would things like the intention and degree of dependency be
considered in determining if the client was a derivative work or not?

What can I do to prevent the above scenario from happening?

Thank you very much for your time!

-- 
Wesley W. Terpstra




Re: GPL and command-line libraries

2004-11-02 Thread Wesley W. Terpstra
On Tue, Nov 02, 2004 at 05:30:36PM -0500, Raul Miller wrote:
> Given that Mr. Wontshare's client represents only a small investment of
> effort, "refuses to port" doesn't sound like much of a problem.

I meant to say relatively small investment; sorry.
Even simple applications can be hard to rewrite though.
Especially, if Mr. Wontshare is evil and uses encrypted traffic protected 
by good old DRM.

> Mr. Wontshare (or someone else) puts your library behind a simply api
> and then builds some application which uses that api, and yet refuses
> to release his code.

I am aware of that, this is why none of my suggested remedies were:
don't make a simple API.

> Someone works with the ecc concepts behind your code and reimplements
> them in some proprietary code base.

I am perfectly fine with this.
If they put in the effort to write it themselves, all power to them.
I just don't want people who don't share to freeload off my work.

I personally hate any kind of algorithm patent, so I wouldn't opt for
that solution. I just included it as an option for completeness.

-- 
Wesley W. Terpstra <[EMAIL PROTECTED]>




Fakeroot to obsolete DESTDIR

2003-04-10 Thread Wesley W. Terpstra
After running into yet two more problems with staged installs ala DESTDIR,
I was reminded of an idea I originally had for fink packages.

... but, let's begin from the beginning.
Why is DESTDIR a problem?

---

1: libtool cannot relink inter-dependent libraries during a staged install.
2: some upstream packages don't provide any means for relocated installation
... probably others, but these are the two I ran into again today.

On the topic of 1, cvs libtool can do this with an undocumented command-line
flag. However, it still prefers the installed location over the staged
location. So, if you link to -L/.../debian/tmp/usr/lib -lfoo, a libfoo.so in
/usr/lib is prefered, which can cause subtle problems. 

Furthermore, libtool has been custom hacked in so many source packages, that
it is not feasible to simply replace ltmain.sh as we do for config.* in
autoconf. Thus, we are stuck with the old libtool + spinoffs for some time.

These problems in libtool can be solved by hacking the ltmain.sh or .la
files by hand, which afaik is what everyone does. This compounds the problem
above.

In regards to 2, if everyone used automake, this would not be a problem.
However, many projects do not use automake, and some of them are even
correct in this decision. So, sometimes we have no DESTDIR.

Often, people end up having to search through all the Makefiles injecting
DESTDIR in various locations, hopefully catching them all. This makes large
.diff files which are problematic to maintain.

Other times, the build system doesn't use Makefiles at all, but bash
scripts, jam, ant, perl or others. In each of these cases, a special
solution must be hand-crafted by the packager in a potentially error-prone
manner.

Now, I admit the above problems are not fatal given the amount of man-power
at debian's disposal. In fact, our current solution---hand hacking---does
work fairly well.

What I would like to propose is simply a way to reduce the effort by
allowing plain 'make install' to work without changing the build scripts.

---

As we all know, fakeroot intercepts stat/chown/chmod/etc. We do this so that
users can install files into a staged location and preserve the correct
permissions.

What I propose to do is to slightly extend fakeroot to also intercept
open/diropen. If the open call would create a file, redirect it to
/.../debian/tmp or some such location. If the call would open a file, first
check /.../debian/tmp and then /.

To illustrate this, lets take an example: (here libbar depends on libfoo)

make install
libtool install libfoo.la /usr/lib/libfoo.la
libtool install libbar.la /usr/lib/libbar.la

The first install relinks libfoo.la against /lib/libc.so.6.

So, fakeroot intercepts ld's open, checks for /.../debian/tmp/lib/libc.so.6
and doesn't find it. Then it tries /lib/libc.so.6 and finds it, so provides
this file handle.

Then fakeroot intercepts ld's output open for /usr/lib/libfoo.la, it
redirects this to /.../debian/tmp/usr/lib/libfoo.la. Hence, the library is
installed in the staging location transparently.

Next, libbar.la is installed.

fakeroot correctly prefers /.../debian/tmp/usr/lib/libfoo.la to
/usr/lib/libfoo.la in the case it was already installed. The output is also
staged.

So, here we have libtool working seamlessly because fakeroot redirected it.

... the beauty of this solution is that it is simple and common. This would
allow us to always install with the package's default install rule and have
magic look after all that nasty stuff.

The interface I propose is to add two new fakeroot options
--outputdir /.../debian/tmp
--searchdir /sw:/

If --outputdir is specified, then it is always prepended to the searchdir
option. When a new file is created, outputdir is the location really used.
When a file is opened, outputdir:searchdir is consulted.

On of the fringe benefits of this is that fink packages don't have to modify
the debian packages much; simply add /sw before / in the search path.

---

What I am looking for is comments about whether people think this is useful,
suggestions on how to make it as simple/easy-to-use as possible, and a list
of which libc functions needs to be trapped.

Presently, I think libfakeroot.so just needs to trap open/diropen and maybe
a few others, faked needs to keep track of the search/outputdir parameters,
and fakeroot has to pass the above options to faked. diropen requires some
extra work to merge the directory listings, but I think this is doable.

---
Wes




Experimental queue?

2009-10-12 Thread Wesley W. Terpstra
The package MLton is a Standard ML compiler which is itself written in
Standard ML. To bootstrap the package building process on a new architecture
requires an initial by-hand cross-compile step (and occasionally some
source-level patching). Thus, the first upload for a new architecture must
be a manual upload of a built-by-hand package. Thereafter I need to confirm
that the autobuilders can build subsequent uploads themselves.

I intend to bootstrap a few more architectures for this package and wanted
to know if this would be an appropriate use for the experimental upload
queue. The intermediate packages are probably more unstable than what one
expects even from the unstable queue. I was hoping I could get some
information about the experimental upload queue as I have never used it:

* Do the autobuilders build packages uploaded as experimental? (eg: to
confirm a successful port)
* Is making an unstable upload really as easy as setting the changes file to
experimental?
* Can a package uploaded to experimental be migrated to unstable?
 * I definitely don't want this happen automatically
 * At some point I probably want to push the newest versions from
experimental to unstable (to facilitate building the new architectures) and
then upload a new 'final' version that gets autobuilt for all the new
targets, landing in unstable.

Finally, how can I determine which debian autobuilders have >1GB of RAM
(required for a successful build).

Advice greatly appreciated.


Re: Experimental queue?

2009-10-14 Thread Wesley W. Terpstra
On Wed, Oct 14, 2009 at 10:55 AM, Goswin von Brederlow wrote:

> > * Do the autobuilders build packages uploaded as experimental? (eg: to
> confirm
> > a successful port)
> The experimental autobuilders do. I think not all archs have one.
>

Ok, sounds like I'll have to upload to unstable then after all.


>
> If you find an arch that has a buildd with >1GB and one with <1GB then
> please do contact the buildd admin to set the package to excluded on
> the smaller buildd.
>

Excellent suggestion, thanks.


Build logs from local builds

2009-10-21 Thread Wesley W. Terpstra
I find the buildd logs on https://buildd.debian.org/ to be extremely
useful. They are nicely organized and it's easy to look back in time
and see previous build problems and/or get a quick overview of the
current build status. However, I find there's one piece of data that
is sadly missing: the log from my local build!

debuild and friends generate a .build file along with the .changes
file, but ftp-master obviously doesn't do anything with said build
file. I think it would be quite useful if there were a way for a
maintainer to upload the build log from his own local system. This
would allow interested users to check the logs (in case they suspect
some sort of problem) as well as help maintainers out by giving them a
more complete record of version builds.

Of course, one could argue that the developer doesn't need his own
logs on buildd.debian.org, since he has them locally. However,
sometimes it might be nice to check the status from a different
machine via the web when confronted with an unusual problem. Also, the
buildd systems are more reliable and backed up than most developer's
private systems. Finally, as I mentioned, a user might want to see the
logs too.

What do other people think? Should this be possible? Should this be required?


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



new buildd dependency resolution breaks self depends?

2011-03-29 Thread Wesley W. Terpstra
I've read that there was a recent change made to the buildd resolution with
regards to ensuring that consistent package versions are used on the builds
[0]. Is it possible that this changed also messed up self-dependency
resolution?

My package, mlton, has a versioned dependency on itself for version >=
20070826. As it is a compiler for SML written in SML, it needs a previous
version of itself installed in order to compile the new version. Previously,
this has presented no problems; the buildd installed the old version and
compiled the new version. Now, the buildd demands that the same version be
installed as is to be built [1]:

*mlton/alpha dependency installability problem:*

  mlton (= 20100608-3) build-depends on one of:
  - mlton (= 20100608-3)

... this is, of course, impossible. The buildd must install the old version
in order to build the new. I have a suspicion that an overzealous 'use the
same version' rule in the dependency resolver might be the cause of this
bug.

Thanks for any help understanding why the buildd system will no longer
attempt to build my package!

[0] http://lists.debian.org/debian-policy/2011/03/msg00103.html
[1] https://buildd.debian.org/status/package.php?p=mlton


Re: new buildd dependency resolution breaks self depends?

2011-03-29 Thread Wesley W. Terpstra
On Tue, Mar 29, 2011 at 5:52 PM, Julien Cristau  wrote:

> As far as I can tell the problem is that you switched the mlton binary
> package to 'Architecture: all'.  Which means it's available on all
> architectures already in the new version, even though it's not
> installable.
>

Ahh! That makes a lot of sense, thanks.

I'll need to figure out a way to work-around this.


Re: new buildd dependency resolution breaks self depends?

2011-03-29 Thread Wesley W. Terpstra
On Tue, Mar 29, 2011 at 6:42 PM, Kurt Roeckx  wrote:

> As long as the Packages file for the buildds mentions this arch
> all package, no buildd can build it, because it only considers
> installing the latest version.  But it should get removed
> from that file after 24 or 32 hours or something.  In which case
> we'll only see the old version, can install those, and things should
> work from there.
>

I hope what you're telling me is true, because it will save  me a lot of
work! :)

What I don't understand about your explanation: once the new all+i386 .debs
hit unstable, won't the buildds see the new 'all' package in unstable and
thus want to install it in preference to the old 'any' package even after it
is removed from the Packages file? The 'all' package will still be
uninstallable since it depends on the missing 'any' packages.

While I can fix the problem at hand by removing the mlton 'all' package for
an upload,  I see a more troublesome problem on the horizon:

The basis, runtime, and compiler packages should all be at the same version
to compile correctly. The basis package is an 'all' package which includes
the cross-platform bits of the runtime library. The runtime and compiler are
'any' packages with compiled object code.

If the Build-Depends lists 'mlton-compiler' (ie: after I resolve the current
problem), any future uploads will see that it has these versions available:
mlton-compiler (= old-version) depends on runtime
mlton-runtime (= old-version) depends on basis
mlton-basis (= new version)
... which I believe means that the old-version mlton-compiler package will
be uninstallable since the old-version of the basis in unstable is hidden by
the new-version.

Have I understood this problem correctly?


Re: new buildd dependency resolution breaks self depends?

2011-03-29 Thread Wesley W. Terpstra
On Tue, Mar 29, 2011 at 7:27 PM, Kurt Roeckx  wrote:

> Note that in unstable you don't see the arch arch all version
> until the arch any version is also available.  Or you would see
> the old arch all version until the new arch any version is
> available.
>

That's great! My thanks to whomever had the foresight to prevent this
temporary dependency breakage for all->any dependencies. I guess this would
otherwise have annoyed unstable users for packages that had yet to be built
for their architecture..?

This means that the version from unstable should always be
> installable, unless there is some other reason it's not like
> a transition of some other library.
>

Yes, the libgmp3-dev -> libgmp-dev transition already bit me this way. I
assumed I was in for more of the same with the self dependency.

The problem is that the buildds currently also see the newer
> arch all version.  But this version will go away after some
> time and it will only see the version from unstable.
>

If I may ask, for what purpose do the buildds have a special list of
packages above and beyond those in unstable?

The new version of mlton-basis will only be visible to the buildds
> for about a day, after which they should have no problem building
> it.
>

Thank god. :)


Re: [buildd-tools-devel] new buildd dependency resolution breaks self depends?

2011-03-29 Thread Wesley W. Terpstra
On Tue, Mar 29, 2011 at 7:10 PM, Lennart Sorensen <
lsore...@csclub.uwaterloo.ca> wrote:

> Does mlton-basis depend on mlton-runtime or mlton-compiler to build?
>
If the answer is yes, then most likely these should not be three seperate
>
source packages.
>

It's all one source package. I split it up the binaries because:
1) about 60% of the package could be in an 'all' package.
2) the runtime components for different architectures can be installed
side-by-side... thus enabling cross-compilation.

If no, then why doesn't it just work or is the problem a previous version
> causing a mess?
>

According to Kurt, there is no problem. It's all in my head. :)


Re: new buildd dependency resolution breaks self depends?

2011-03-30 Thread Wesley W. Terpstra
On Tue, Mar 29, 2011 at 8:03 PM, Kurt Roeckx  wrote:

> On Tue, Mar 29, 2011 at 07:54:59PM +0200, Wesley W. Terpstra wrote:
> > If I may ask, for what purpose do the buildds have a special list of
> > packages above and beyond those in unstable?
>
> So that in case various packages have to be build in an order,
> where the seconds depends on the first being available and so on,
> that it doesn't take weeks to get them all build.  We would have
> to wait at least a dinstall before the next one could be build,
> assuming sometimes has the time to sign the package between
> dinstalls.
>
> It basicly just avoids a whole lot of delays.
>

Unfortunately, it seems also to add quite some delays in the self-compiling
case. :-/ Each time a buildd finishes, that buildd's Packages file gets
updated due to the completed binary upload and all other buildds go back
into the BD-Uninstallable state. (I assume this also means the package loses
its place in line on the busy buildd queues)

I wonder if the same rules applied to the unstable package list (don't
include the all for a package whose any is not done) could be applied also
to the buildd's Packages?


mlton any->all package transition breakage

2011-03-31 Thread Wesley W. Terpstra
Good afternoon.

I am the maintainer for the Standard ML 97 (SML) compiler mlton. This
compiler is itself written in SML and is self-hosting. Thus, it needs an
older version of the compiler in order to bootstrap itself. Further
complicating things, the build needs in the ballpark of 1-2GB of physical
memory for 32-64-bit architectures, otherwise the build will cause the host
machine to swap-till-death. Over the years I have slowly increased the
number of supported architectures in debian via a combination of
cross-compilation and binary uploads. At the moment, every major debian
architecture is supported.

Recently I had to prepare a new upload due to the gmp transition and took
the opportunity now that squeeze is released to split out the
arch-independent components of this monolithic package. Unfortunately, this
had unforeseen consequences on the buildd system. The problem is that the
old 'any' package (20100608-2) got removed from unstable before the new
package's (20100608-3) buildd runs completed; only the amd64 buildd was fast
enough. I am not entirely clear on the cause, but the consequence is clear
enough: the buildds can no longer install the old version of mlton needed to
bootstrap the new version.

It has been proposed to me to manually rebuild the package on every debian
architecture and then binary upload the result. To that end, I request
installation in a sid chroot these packages from unstable: libgmp-dev
htmldoc texlive-latex-base procps debhelper cdbs quilt joe. Additionally,
please install from squeeze (should still install cleanly in sid chroot) the
package: mlton.

I request the above packages to be installed on these machines:
albeniz.debian.org  alpha   8g  y
abel.debian.org armel   1.5gy
merulo.debian.org   ia648g  y
asdfasdf.debian.net kfreebsd-amd64  2g  y
ad...@asdfasdf.debian.net
io.debian.net   kfreebsd-i386   1.5gy   ad...@io.debian.net
gabrielli.debian.orgmips1.6gy
zelenka.debian.org  s3901g  y
smetana.debian.org  sparc   2g  y

Unfortunately, a look over the currently available porterboxes shows that
not every architecture can be fixed this way: paer does not have a sid
chroot, strauss has insufficient memory, mipsel has no porterbox at all, and
pescetti has insufficient memory. This means that I cannot rebuild the
package for: hppa, hurd-i386, mipsel, powerpc. The available buildd machines
*can* rebuild the package on these architectures, but will not do so as long
as the old version is missing from unstable.

I believe strauss has configurable main memory. If it could be temporarily
given 1.5G, then that would solve hurd-i386.
Would it be possible to get a sid dchroot setup on paer? If yes, that's
another architecture fixed.

I am looking for a solution to this build problem for mipsel and powerpc. If
the old mlton 'any' package (still in squeeze) were re-added to unstable,
that would work (and also render the above package installation requests
unnecessary). I'm open to any other suggestions.

One option I have considered: by-hand, rip the contents out of the old mlton
'any' package and rebundle the old contents as the "new version" and do a
binary upload. This way I could get packages for powerpc and mipsel that
would work to properly bootstrap a new upload on the buildds. This is a
pretty nasty hack and would mean that the sources do not match the binaries
for this one uploaded version, but this might be acceptable as a
transitionary step...?

Any help appreciated!

PS. I could not determine which mailing list is haunted by the ftp-masters.
If debian-admin is wrong, please forward it.


Re: mlton any->all package transition breakage

2011-04-01 Thread Wesley W. Terpstra
On Fri, Apr 1, 2011 at 9:58 PM, Peter Palfrader  wrote:

> all the other chroots are now fucked because we did as you asked, and
> for some reason it wants to bring in mlton-doc


The problem there is you're trying to install and/or upgrade the sid one,
not the squeeze one.


> I think we won't be doing anything like that again any time soon.


Feel free to purge whatever mlton package you have installed.

I've been fairly successful building the package from installing the .debs
in my home directory.


Re: mlton any->all package transition breakage

2011-04-01 Thread Wesley W. Terpstra
On Fri, Apr 1, 2011 at 10:47 PM, Peter Palfrader  wrote:

> > The problem there is you're trying to install and/or upgrade the sid one,
> > not the squeeze one.
>
> No, the squeeze package installed cleanly.  now apt-get update &&
> upgrade breaks.  That means the package is buggy.
>

Yes, I know it's buggy. That's why I'm trying to fix it.

It is missing a replaces/breaks, which leads to upgrade problems.
However, before I can upload a new version that fixes that, I need to get a
working bootstrap version that the buildd will actually install.