Reproducible, precompiled .o files: what say policy+gpl?
I am developing a very CPU-intensive, open-source error-correcting code. The intention of this code is that you can split a large (> 5GB) file across multiple packets. Whenever you receive enough packets that their combined size = the file size, you can decode the packets to recover the file, regardless of which packets you get. This means a lot of calculation over gigabytes worth of data. Therefore, speed of utmost importance in this application. The project itself includes an ocaml compiler (derived from fftw) which generates C code to perform various types of Fast Galois Transforms. Some of the output C code uses SSE2 exclusively. This C code is then compiled and linked in with the other C sources that make up the application. Now, on to the dilemma: icc produces object files which run ~2* faster than the object files produced by gcc when SSE2 is used. (The non-SSE2 versions are also faster, but so significantly) Both gcc and icc can compile the generated C files. My University will shortly own a licence for icc which allows us to distribute binaries. So, when it comes time to release this and include it in a .deb, I ask myself: what would happen if I included (with the C source and ocaml compiler) some precompiled object files for i386? As long as the build target is i386, these object files could be linked in instead of using gcc to produce (slower) object files. This would mean a 2* speedup for users, which is vital in order to reach line-speed. Other platforms recompile as normal. On the other hand, is this still open source? Is this allowed by policy? Can this go into main? Some complaints and my answers below: C: How do we know the object files aren't trojaned? A: Because I am both the upstream developer and (will be) the debian maintainer, and I say they aren't. C: You can't recompile the application without ICC, which is not free. A: You can still rebuild it with gcc. C: But you can't rebuild _exactly_ the same binary. A: This is essentially *my* question: is this required by policy/gpl? Remember, you can always get ICC yourself. If there is a GPL problem, then I think no MSVC application can be GPL either. C: You're just too lazy to hand-optimize the assembler and include that. A: You're right. Some of those auto-generated C files are > 64k of completely incomprehensible math. I could include .S files instead of .o files, though, if that helps. C: You're just too lazy to fix gcc. A: I also wouldn't know where to begin, and I already file bugs. Even if I did know where to begin, gcc is not my responsibility. C: A (security) bugfix won't get linked in. A: A bug in the auto-generated C code is unlikely, and if they was one, changing the .c file makes it newer than the .o, which means gcc will rebuild it. That's it! What are the thoughts of GPL and policy experts? PS. I will provide the source code to anyone who requests it, but not yet under the GPL. Only after I publish a paper about the algorithm will the code be released under the GPL. -- Wesley W. Terpstra <[EMAIL PROTECTED]>
Re: Reproducible, precompiled .o files: what say policy+gpl?
Since there's one GPL question left, I am still posting to debian-legal. The legal question is marked ** for those who want to skip the rest. On Mon, Oct 18, 2004 at 11:49:56AM -0700, Josh Triplett wrote: > Whether your university owns a license or not does not really affect > Debian. icc cannot be included in Debian main. No, but debian can distribute precompiled object files (legally). The binaries I meant were the object files. > Keep in mind that if your algorithm is as good as it sounds, it will be > around for a long time. Even if a GCC-compiled version can't achieve > line-speed right now, if all it needs is a 2x speedup, normal increases > in computer technology will provide that soon enough. True enough, but as processors get faster, so does bandwidth. I expect that ultimately, it will always need to be as fast as possible. > Consider this: any package with non-free Build-Depends that aren't > strictly required at runtime could take this approach, by shipping > precompiled files. For example, this has come up several times with > Java packages that tried to just ship a (Sun/Blackdown-compiled) .jar > file in the source package. The answer here is the same: you can't ship > compiled files to avoid having a non-free build-depends (and shouldn't > ship compiled files at all, even if they were compiled with a Free > compiler); the package should always be built from source. That is a good argument; thank you. > * Upload a package to main which builds using GCC. (As a side note, you > might check to see if GCC 3.4/3.5 produces significantly better code.) gcc-3.3 is not an issue; it ICEs. gcc-3.4.2 is the version I was referring to. > * Make it easy for people to rebuild using icc. See the openoffice.org > packages for an example; they contain support for rebuilding using a > non-free JDK based on a flag in DEB_BUILD_OPTIONS. That's a good idea. > * Supply icc-built packages either on your people.debian.org site or in > contrib; if the latter, you need to use a different package name and > conflict with the gcc-built package in main. Josselin Mouette <[EMAIL PROTECTED]> said: > If you really want to distribute a package built with icc, you should > make a separate package in the contrib section, and have it conflict > with the package in main. Yes, this sounds like a good plan. Put the normal gcc version rsgt in main where the i386 deb has: Recommends: rsgt-icc rsgt-icc sits in contrib, completely built by icc (not just some .o s) Conflicts: rsgt Provides: rsgt Replaces: rsgt If an i386 user (with contrib sourced) runs 'apt-get install rsgt' will that make apt install rsgt-icc? That's what I hope to accomplish. (PS. rsgt is not the final name) ** For it to sit in contrib, would I have to include the source code in contrib as well? Or would the fact that the source code was in main already satisfy the GPL requirement of source availability? Clearly, it could still sit in non-free without the source, but contrib is more accurate imo. If there's no reason to include 2* the source, I see no reason to present 2* the load to the ftp-servers. > it is acceptable *under the GPL* to provide binaries compiled with > non-free compilers, unless the resulting compiled binary is somehow > derivative of a non-free work that is not an OS component. In the end, if > people want to exercise their rights under the GPL, they will want the > source, not the binaries, and you are supplying that source alongside the > binaries, which satisfies the GPL. Then I suppose it makes sense to just supply a precompiled version (with icc) and a source tarball as upstream. The debian version would work as covered already above. > > PS. I will provide the source code to anyone who requests it, but not yet > > under the GPL. Only after I publish a paper about the algorithm will the > > code be released under the GPL. > > Keep in mind that FFTW is GPLed, so unless you have made other > arrangements with its copyright holders, you need to refrain from > supplying the code or binaries to anyone unless under the GPL. Oh, that's a good point. I withdrawl my offer of private pre-release. You can only have a copy after I publish. ;) Thank you for your detailed explanation and answer. -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Mon, Oct 18, 2004 at 01:33:07PM -0500, John Hasler wrote: > Josselin Mouette writes: > > Main must be built with only packages from main. > > Packages in main must be _buildable_ with only packages from main. Interesting. This slight difference in wording sounds to me like I would indeed be able to include prebuilt object files, so long as the package could be built without them. Is that correct? The actual text in policy is: * must not require a package outside of main for compilation or execution (thus, the package must not declare a "Depends", "Recommends", or "Build-Depends" relationship on a non-main package) This wording appears to back up what you say (John). The clause 'must not require' is fine with my case. Since the source files can be rebuilt with gcc, icc is not required. Execution is a non-issue. At this point my question is only academic; the pure-gcc in main, icc-prebuilt in contrib solution seems to solve my concerns just as well. -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Mon, Oct 18, 2004 at 05:15:20PM -0400, Glenn Maynard wrote: > Isn't this just what PAR and PAR2 do (in conjunction with a file splitter)? Thanks for the pointer to this project; I didn't know about it. However, to answer your question: no. PAR2 uses Reed-Solomon codes, my project also uses a variant of Reed-Solomon. However, the result is completely different. PAR2 aims to correct corruption of data (as well as loss). Most differently, it also trades disk overhead for speed. My program does not attempt to correct corruption (though there are algorithms which can do so, just not efficiently). If corruption resilience is needed, you would have to add a CRC to each packet. [ For those who know how Reed-Solomon works, I am applying Reed-Solomon over the entire file as one block -- using an alphabet of size ~2^62. Normally Reed-Solomon breaks the file into blocks of a fixed size and adds the correction these small blocks. ] For example, with par2, I have -rw-r--r-- 1 terpstra terpstra 627103 2004-10-19 00:18 foo.pdf I ran: par2 create foo.pdf It generated: -rw-r--r-- 1 terpstra terpstra 40600 2004-10-19 00:19 foo.pdf.par2 -rw-r--r-- 1 terpstra terpstra 40980 2004-10-19 00:19 foo.pdf.vol000+01.par2 -rw-r--r-- 1 terpstra terpstra 81860 2004-10-19 00:19 foo.pdf.vol001+02.par2 -rw-r--r-- 1 terpstra terpstra 123120 2004-10-19 00:19 foo.pdf.vol003+04.par2 -rw-r--r-- 1 terpstra terpstra 165140 2004-10-19 00:19 foo.pdf.vol007+08.par2 -rw-r--r-- 1 terpstra terpstra 208680 2004-10-19 00:19 foo.pdf.vol015+16.par2 -rw-r--r-- 1 terpstra terpstra 255260 2004-10-19 00:19 foo.pdf.vol031+32.par2 -rw-r--r-- 1 terpstra terpstra 257540 2004-10-19 00:19 foo.pdf.vol063+38.par2 I then removed foo.pdf and tried: par2 recover foo.pdf.par2 ... You have 0 out of 2010 data blocks available. You have 101 recovery blocks available. Repair is not possible. You need 1909 more recovery blocks to be able to repair. To which, I say, wtf! The data I have totals du -sb . 1173540 . That's even more than foo.pdf! [EMAIL PROTECTED]:~/x/y$ du -bs ../foo.pdf 627103 foo.pdf ... and yet par2 tells me I need 1909 more (I have 101). That means I would need 18* more information in order to recover foo.pdf! I have to admit, I am surprised by this. I would have expected a better ratio, even with very old techniques. My program is entirely different. Let's say it's called rsgt (I haven't decided on a name yet). You run: rsgt 1024 foo.pdf You then get files all of size 1024 (regardless of the size of foo.pdf). As long as you have enough files that the total size is the same or larger than the size of foo.pdf, you can recover foo.pdf. You can create nearly 2^62 different such files, and start generating them from any starting point. This means you could essentially send a never ending stream of (network sized) packets which people could 'tap into'. After getting any subsequence of packets, as long as you get enough so that their total is the file size, you are done. This what is different about my program: it has zero space overhead (well, it has an 8 byte per packet header, but it doesn't depend on packet length). What is new in terms of research is that I have an algorithm which can do the decoding quickly. rsgt aims not to add 'parity' as I think 'par'2 is intended to suggest. Rather, it transforms a file into a stream of user-defined-size packets which goes on practically forever. ANY of the packets will do to get back the original file, as long as the sum is the same size. -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Mon, Oct 18, 2004 at 07:45:39PM -0400, Glenn Maynard wrote: > On Tue, Oct 19, 2004 at 12:59:42AM +0200, Wesley W. Terpstra wrote: > > To which, I say, wtf! > You're using it wrong. Well thank goodness, b/c otherwise that would be really awful. :) This gives me a great source to compare my algorithm's speed against. Thank you again for showing me it, and straighting out my misuse. > [instructions which I followed and worked] > This is exactly what you describe. Yep! That's what Reed-Solomon codes can do. I never claimed to invent them; as I said earlier, my work was on creating an algorithm which could decode >5GB of data broken into packets (which for me means small packets of network size). I suggest you try: dd if=/dev/urandom of=testing bs=16 count=1048576 split -a 3 -b 1024 testing testing.part. find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par ... now, twiddle you thumbs. After about a minute, you will see it process the first file Another 30s will get you the second. You have 16382 to go. If you strace it, you will see there are no system calls being performed at this time, so the slowness is not due to the large directory. Also, the processor is fully loaded. This is only for _encoding_ which is much easier than decoding. You also are only doing it over 16MB. My algorithm can handle 3 orders of magnitude more data much faster. I would wager that par is using the Berklekamp-Masey algorithm for decoding; this is the most popular algorithm for RS codes at the moment. This algorithm has time O(n^2) for n 'parts'. My algorithm has time O(nlogn) and a not too terrible constant. Perhaps I should make my program 'par' command-line compatible! OTOH, when you have so many small files it is not convenient. Thank you very very much for bringing this implementation to my attention. -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Tue, Oct 19, 2004 at 02:49:09AM +0200, Wesley W. Terpstra wrote: > I would wager that par is using the Berklekamp-Masey algorithm for decoding; That would be Berlekamp-Massey. Appologies to both. I should add their names to my spell checker. =) -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Tue, Oct 19, 2004 at 02:49:09AM +0200, Wesley W. Terpstra wrote: > find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par After taking a look in the source code for par, I found this in rs.c: |*| Calculations over a Galois Field, GF(8) What does that mean? It means there are only 2^8 possible values to evaluate the polynomial at. So, in fact, the above command will not even work, should it terminate. However, this is not a problem with RS codes, just par. I was also wrong about them using Berlekamp-Massey. They use Gaussian elimination to compute the inverse. This is complexity O(R^2N) which is even worse (see rs.c:214). R=number of input files/packets N=total number of files that were in the output (so N>=R)[B ie:N>=R=n -> O(n^3) vs. O(n^2) for Berlekamp-Massey or O(nlogn) for mine. OTOH, since they have at most 2^8 possible 'packets' (including the source), that keeps the complexity from killing them. --> 'only' 2^24 operations =) Also on the plus side, this is the complexity per number of packets, not the complexity of the data to be processed. Although I haven't checked, I would speculate from the simplicity of the code that they use the normal matrix product algorithm, which means (at best) O(LR) where L is the total length of the data, R is the number of files. As I mentioned, my code has an alphabet around 2^62. Actually, it's (2^31-1)^2-1 .. but that's almost the same. Handling potentially so many more packets means you need a new algorithm. Still, par is very cool and I will liberally lift usability from it. ... and, of course, use it for unfair comparisons in my paper. =) -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
I know this thread has progressed beyond the actual situation I asked about, but I wanted to just throw in my opinion too. On Tue, Oct 19, 2004 at 09:13:24AM +0200, Andreas Barth wrote: > A program is IMHO not only specified by the fact that it does certain > transformations from input to output, but also by the speed it does > this. If this specification can be matched by gcc, why consider using > icc at all? And if not, it requires icc. This is now also my point of view. When I started this thread, I also _felt_ that contrib was the correct place for my application, but didn't really know why. Now I can explain it better. The proposal of keeping one version in main and one in contrib also addresses my concern about usability. So, I am happy with the outcome of this discussion already. =) -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Mon, Oct 18, 2004 at 11:05:16PM -0400, Glenn Maynard wrote: > > find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 > > testing.par > > You're splitting into parts which are far too small. Yes, it's too small for par. It's also clear that it's too small for usenet use. However, my program is not intended to solve usenet problems. My applications are all on the level of packet switching networks. Besides, who ever complains when something is faster than they need? Or as my first computer science prof once said: Insertion sort is fine for most tasks. Sometimes you need quicksort. > It's not designed for thousands of tiny parts No, but mine is. > Most PAR operations are IO-bound (judging by drive thrashing, not from > actual benchmarks). Not to be rude, but you're mistaken here. strace it when there are many small files; it is not doing syscalls. Disk IO and/or thrashing is not the issue for small files. Maybe disk thrashing is a problem during normal par operation, but it is a minor problem compared to the computation (for my goals). [ As aside, my algorithm is also streaming; it reads the 'file' in sequence three times, so disk thrashing should not be a problem. ] > I don't really understand the use of allowing thousands of tiny parts. > What's the intended end use? Note that PAR cannot help you if the unit of failure is very small. Even one missing piece of a 'part' makes that 'part' useless. Florian already mentioned multicast, and that is my first application. Another situation is one where you have any one-way network link (some crazy firewalls [my work; arg!!]). Future (?) wireless networks might have base station with a larger range than the clients. Clients could still download (without ACKs) in this case. Perhaps your ISP has packet loss that sometimes sits at 20% (my home; arg!). If you know how TCP works, you will also know that it will nearly stop sending because it thinks the network is congested, even though the real problem is a faulty switch which drops 20% of packets seemingly at random. Using my code over UDP completely removes this problem. (However, this is dangerous because my code is also 'unfair' in the sense that it will stop all TCP traffic b/c it will not care about packet loss due to conjestion while the TCP traffic will back off) You might also use it to make a version of bittorrent where each packet is independent of the others. This would help prevent 'dead' torrents where there is no seed and all downloads stall b/c the known information overlaps. Another case might be mobile agents where PDAs exchange parts of files they are looking for whenever they run into other PDAs they can bargain with (like bittorrent). However, PDAs move when their owners move, so network sessions are interuppted at random times, and one PDA may never see the other ever again. This scheme would let a PDA broadcast a file to all nearby PDAs which could make use of the information regardless of when they leave (mid 'part'?) or whether they have already pieces of the file. Another situation I would like to apply my code to is to sensor networks where there is a stream of measurements of some variable. My code can not presently handle this correctly, but that is future work for me. I am not a very imaginative person; I am sure there are many other situations where this could be applied. From another point of view, research doesn't *need* to be practical. ;) If other people have ideas, I'd like to hear them. -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
hich must be distributed under the terms of Sections > > 1 and 2 above on a medium customarily used for software interchange; or, > [snip 3b and 3c] > [snip OS exception] > > > > If distribution of executable or object code is made by offering > > access to copy from a designated place, then offering equivalent > > access to copy the source code from the same place counts as > > distribution of the source code, even though third parties are not > > compelled to copy the source along with the object code. > > So the question is whether a source package in main "accompanies" a > binary package in contrib, and/or whether "equivalent access" is > offered. This is certainly questionable. It would also depend on > mirroring; for example, if contrib were ever moved to a different server > (which has been debated in the past), this would become clearly false. > > My advice would be this: unless the source is incredibly huge (such as > with OO.o), then I don't think saving a few tens of MB is worth dealing > with the questions and complexities this raises. That's my conclusion too. Thanks again! -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Wed, Oct 20, 2004 at 02:06:46PM +0200, Wesley W. Terpstra wrote: > On Tue, Oct 19, 2004 at 04:59:37PM -0700, Josh Triplett wrote: > > Possibly; however, I think bandwidth grows far slower than CPU speed and > > overall system power. I do understand your concern, though. > > I intend to find out his source for these slides b/c this is very > important to know. Apparently this data comes from Jim Gray http://research.microsoft.com/~Gray/ who works for Microsoft Research. The information is current as of 99/00: http://research.microsoft.com/~gray/papers/MS_TR_99_100_Rules_of_Thumb_in_Data_Engineering.pdf I would love to see more recent data if anyone has it. -- Wesley W. Terpstra
Re: Reproducible, precompiled .o files: what say policy+gpl?
On Sat, Oct 23, 2004 at 12:27:25PM -0600, Gunnar Wolf wrote: > Wesley W. Terpstra dijo [Mon, Oct 18, 2004 at 09:59:36PM +0200]: > > At this point my question is only academic; the pure-gcc in main, > > icc-prebuilt in contrib solution seems to solve my concerns just as well. > > I have only one concern with this: What happens if you drop the > package and someone else takes it? He will no longer be able to > compile it with icc, and the icc-prebuilt users will be left out in > the cold. What would you say to that? He can upload a version to contrib which depends on the version in main and has no contents. Then the icc users are automatically converted to gcc. Or else, if he is an open-source developer who makes no money from his debian work, he can download icc from their site for free. Just universities and paid researchers like me have to pay. Sniff. -- Wesley W. Terpstra
GPL and command-line libraries
Good evening! I'm developing an error-correcting code library which works on a lot of data at once. Since the API is quite simple and the cost of process creation relatively insignificant, I would like to provide a command-line API. I feel this has several engineering advantages: 1) it's easier to debug and understand if you can interact with it 2) bugs in the library don't crash programs using it 3) multithreading can be used in the library and not the application ... and other more problem-specific benefits. To the point: I want this library to be released under the GPL, BUT ... Would the GPL still protect it? If someone writes a program that does: popen('my-api'); does the GPL require that program to also be GPL? From the short answer I got on IRC it seemed the answer was: No! What I am concerned about is the following scenario: Mr. John Wontshare writes a streaming multicast client. To deal with packet loss, he uses my error-correcting library. Without my library, Mr. Wontshare's client can't work at all. Mr. Wontshare's client represents only a small investment of effort and without having had access to my library, he could have never written it. He then distributes his client along with my library to end-users. These users don't get Mr. Wontshare's code, even though he uses my library. Even worse, he refuses to port his client to MacOS X for business reasons. (intentionally giving an unfair competitive advantage to another platform) To me anyways, this sounds like exactly the situation the GPL is supposed to protect against. Is this _not_ a derivative work? If that's really the case, is it possible that a GPLv3 might address this? There are several things I've considered to prevent this scenario: 1. Write in all documentation, help, etc: "popen my app = derivative work" ... and hope that this is enough to give me a victory in a lawsuit or at least scare Mr. Wontshare away from even trying this. 2. Patenting the new algorithm my library uses and putting in a clause which covers this corner-case and making it otherwise free. 3. Crafting a special (GPL-incompatible) licence which does what I want. 4. Writing to debian-legal and asking for advice. I've heard all sorts of arguements in IRC that drawing the line in a good way is very hard. I believe that. However, what I want to know is, if this went to court, would things like the intention and degree of dependency be considered in determining if the client was a derivative work or not? What can I do to prevent the above scenario from happening? Thank you very much for your time! -- Wesley W. Terpstra
Re: GPL and command-line libraries
On Tue, Nov 02, 2004 at 05:30:36PM -0500, Raul Miller wrote: > Given that Mr. Wontshare's client represents only a small investment of > effort, "refuses to port" doesn't sound like much of a problem. I meant to say relatively small investment; sorry. Even simple applications can be hard to rewrite though. Especially, if Mr. Wontshare is evil and uses encrypted traffic protected by good old DRM. > Mr. Wontshare (or someone else) puts your library behind a simply api > and then builds some application which uses that api, and yet refuses > to release his code. I am aware of that, this is why none of my suggested remedies were: don't make a simple API. > Someone works with the ecc concepts behind your code and reimplements > them in some proprietary code base. I am perfectly fine with this. If they put in the effort to write it themselves, all power to them. I just don't want people who don't share to freeload off my work. I personally hate any kind of algorithm patent, so I wouldn't opt for that solution. I just included it as an option for completeness. -- Wesley W. Terpstra <[EMAIL PROTECTED]>
Fakeroot to obsolete DESTDIR
After running into yet two more problems with staged installs ala DESTDIR, I was reminded of an idea I originally had for fink packages. ... but, let's begin from the beginning. Why is DESTDIR a problem? --- 1: libtool cannot relink inter-dependent libraries during a staged install. 2: some upstream packages don't provide any means for relocated installation ... probably others, but these are the two I ran into again today. On the topic of 1, cvs libtool can do this with an undocumented command-line flag. However, it still prefers the installed location over the staged location. So, if you link to -L/.../debian/tmp/usr/lib -lfoo, a libfoo.so in /usr/lib is prefered, which can cause subtle problems. Furthermore, libtool has been custom hacked in so many source packages, that it is not feasible to simply replace ltmain.sh as we do for config.* in autoconf. Thus, we are stuck with the old libtool + spinoffs for some time. These problems in libtool can be solved by hacking the ltmain.sh or .la files by hand, which afaik is what everyone does. This compounds the problem above. In regards to 2, if everyone used automake, this would not be a problem. However, many projects do not use automake, and some of them are even correct in this decision. So, sometimes we have no DESTDIR. Often, people end up having to search through all the Makefiles injecting DESTDIR in various locations, hopefully catching them all. This makes large .diff files which are problematic to maintain. Other times, the build system doesn't use Makefiles at all, but bash scripts, jam, ant, perl or others. In each of these cases, a special solution must be hand-crafted by the packager in a potentially error-prone manner. Now, I admit the above problems are not fatal given the amount of man-power at debian's disposal. In fact, our current solution---hand hacking---does work fairly well. What I would like to propose is simply a way to reduce the effort by allowing plain 'make install' to work without changing the build scripts. --- As we all know, fakeroot intercepts stat/chown/chmod/etc. We do this so that users can install files into a staged location and preserve the correct permissions. What I propose to do is to slightly extend fakeroot to also intercept open/diropen. If the open call would create a file, redirect it to /.../debian/tmp or some such location. If the call would open a file, first check /.../debian/tmp and then /. To illustrate this, lets take an example: (here libbar depends on libfoo) make install libtool install libfoo.la /usr/lib/libfoo.la libtool install libbar.la /usr/lib/libbar.la The first install relinks libfoo.la against /lib/libc.so.6. So, fakeroot intercepts ld's open, checks for /.../debian/tmp/lib/libc.so.6 and doesn't find it. Then it tries /lib/libc.so.6 and finds it, so provides this file handle. Then fakeroot intercepts ld's output open for /usr/lib/libfoo.la, it redirects this to /.../debian/tmp/usr/lib/libfoo.la. Hence, the library is installed in the staging location transparently. Next, libbar.la is installed. fakeroot correctly prefers /.../debian/tmp/usr/lib/libfoo.la to /usr/lib/libfoo.la in the case it was already installed. The output is also staged. So, here we have libtool working seamlessly because fakeroot redirected it. ... the beauty of this solution is that it is simple and common. This would allow us to always install with the package's default install rule and have magic look after all that nasty stuff. The interface I propose is to add two new fakeroot options --outputdir /.../debian/tmp --searchdir /sw:/ If --outputdir is specified, then it is always prepended to the searchdir option. When a new file is created, outputdir is the location really used. When a file is opened, outputdir:searchdir is consulted. On of the fringe benefits of this is that fink packages don't have to modify the debian packages much; simply add /sw before / in the search path. --- What I am looking for is comments about whether people think this is useful, suggestions on how to make it as simple/easy-to-use as possible, and a list of which libc functions needs to be trapped. Presently, I think libfakeroot.so just needs to trap open/diropen and maybe a few others, faked needs to keep track of the search/outputdir parameters, and fakeroot has to pass the above options to faked. diropen requires some extra work to merge the directory listings, but I think this is doable. --- Wes
Experimental queue?
The package MLton is a Standard ML compiler which is itself written in Standard ML. To bootstrap the package building process on a new architecture requires an initial by-hand cross-compile step (and occasionally some source-level patching). Thus, the first upload for a new architecture must be a manual upload of a built-by-hand package. Thereafter I need to confirm that the autobuilders can build subsequent uploads themselves. I intend to bootstrap a few more architectures for this package and wanted to know if this would be an appropriate use for the experimental upload queue. The intermediate packages are probably more unstable than what one expects even from the unstable queue. I was hoping I could get some information about the experimental upload queue as I have never used it: * Do the autobuilders build packages uploaded as experimental? (eg: to confirm a successful port) * Is making an unstable upload really as easy as setting the changes file to experimental? * Can a package uploaded to experimental be migrated to unstable? * I definitely don't want this happen automatically * At some point I probably want to push the newest versions from experimental to unstable (to facilitate building the new architectures) and then upload a new 'final' version that gets autobuilt for all the new targets, landing in unstable. Finally, how can I determine which debian autobuilders have >1GB of RAM (required for a successful build). Advice greatly appreciated.
Re: Experimental queue?
On Wed, Oct 14, 2009 at 10:55 AM, Goswin von Brederlow wrote: > > * Do the autobuilders build packages uploaded as experimental? (eg: to > confirm > > a successful port) > The experimental autobuilders do. I think not all archs have one. > Ok, sounds like I'll have to upload to unstable then after all. > > If you find an arch that has a buildd with >1GB and one with <1GB then > please do contact the buildd admin to set the package to excluded on > the smaller buildd. > Excellent suggestion, thanks.
Build logs from local builds
I find the buildd logs on https://buildd.debian.org/ to be extremely useful. They are nicely organized and it's easy to look back in time and see previous build problems and/or get a quick overview of the current build status. However, I find there's one piece of data that is sadly missing: the log from my local build! debuild and friends generate a .build file along with the .changes file, but ftp-master obviously doesn't do anything with said build file. I think it would be quite useful if there were a way for a maintainer to upload the build log from his own local system. This would allow interested users to check the logs (in case they suspect some sort of problem) as well as help maintainers out by giving them a more complete record of version builds. Of course, one could argue that the developer doesn't need his own logs on buildd.debian.org, since he has them locally. However, sometimes it might be nice to check the status from a different machine via the web when confronted with an unusual problem. Also, the buildd systems are more reliable and backed up than most developer's private systems. Finally, as I mentioned, a user might want to see the logs too. What do other people think? Should this be possible? Should this be required? -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
new buildd dependency resolution breaks self depends?
I've read that there was a recent change made to the buildd resolution with regards to ensuring that consistent package versions are used on the builds [0]. Is it possible that this changed also messed up self-dependency resolution? My package, mlton, has a versioned dependency on itself for version >= 20070826. As it is a compiler for SML written in SML, it needs a previous version of itself installed in order to compile the new version. Previously, this has presented no problems; the buildd installed the old version and compiled the new version. Now, the buildd demands that the same version be installed as is to be built [1]: *mlton/alpha dependency installability problem:* mlton (= 20100608-3) build-depends on one of: - mlton (= 20100608-3) ... this is, of course, impossible. The buildd must install the old version in order to build the new. I have a suspicion that an overzealous 'use the same version' rule in the dependency resolver might be the cause of this bug. Thanks for any help understanding why the buildd system will no longer attempt to build my package! [0] http://lists.debian.org/debian-policy/2011/03/msg00103.html [1] https://buildd.debian.org/status/package.php?p=mlton
Re: new buildd dependency resolution breaks self depends?
On Tue, Mar 29, 2011 at 5:52 PM, Julien Cristau wrote: > As far as I can tell the problem is that you switched the mlton binary > package to 'Architecture: all'. Which means it's available on all > architectures already in the new version, even though it's not > installable. > Ahh! That makes a lot of sense, thanks. I'll need to figure out a way to work-around this.
Re: new buildd dependency resolution breaks self depends?
On Tue, Mar 29, 2011 at 6:42 PM, Kurt Roeckx wrote: > As long as the Packages file for the buildds mentions this arch > all package, no buildd can build it, because it only considers > installing the latest version. But it should get removed > from that file after 24 or 32 hours or something. In which case > we'll only see the old version, can install those, and things should > work from there. > I hope what you're telling me is true, because it will save me a lot of work! :) What I don't understand about your explanation: once the new all+i386 .debs hit unstable, won't the buildds see the new 'all' package in unstable and thus want to install it in preference to the old 'any' package even after it is removed from the Packages file? The 'all' package will still be uninstallable since it depends on the missing 'any' packages. While I can fix the problem at hand by removing the mlton 'all' package for an upload, I see a more troublesome problem on the horizon: The basis, runtime, and compiler packages should all be at the same version to compile correctly. The basis package is an 'all' package which includes the cross-platform bits of the runtime library. The runtime and compiler are 'any' packages with compiled object code. If the Build-Depends lists 'mlton-compiler' (ie: after I resolve the current problem), any future uploads will see that it has these versions available: mlton-compiler (= old-version) depends on runtime mlton-runtime (= old-version) depends on basis mlton-basis (= new version) ... which I believe means that the old-version mlton-compiler package will be uninstallable since the old-version of the basis in unstable is hidden by the new-version. Have I understood this problem correctly?
Re: new buildd dependency resolution breaks self depends?
On Tue, Mar 29, 2011 at 7:27 PM, Kurt Roeckx wrote: > Note that in unstable you don't see the arch arch all version > until the arch any version is also available. Or you would see > the old arch all version until the new arch any version is > available. > That's great! My thanks to whomever had the foresight to prevent this temporary dependency breakage for all->any dependencies. I guess this would otherwise have annoyed unstable users for packages that had yet to be built for their architecture..? This means that the version from unstable should always be > installable, unless there is some other reason it's not like > a transition of some other library. > Yes, the libgmp3-dev -> libgmp-dev transition already bit me this way. I assumed I was in for more of the same with the self dependency. The problem is that the buildds currently also see the newer > arch all version. But this version will go away after some > time and it will only see the version from unstable. > If I may ask, for what purpose do the buildds have a special list of packages above and beyond those in unstable? The new version of mlton-basis will only be visible to the buildds > for about a day, after which they should have no problem building > it. > Thank god. :)
Re: [buildd-tools-devel] new buildd dependency resolution breaks self depends?
On Tue, Mar 29, 2011 at 7:10 PM, Lennart Sorensen < lsore...@csclub.uwaterloo.ca> wrote: > Does mlton-basis depend on mlton-runtime or mlton-compiler to build? > If the answer is yes, then most likely these should not be three seperate > source packages. > It's all one source package. I split it up the binaries because: 1) about 60% of the package could be in an 'all' package. 2) the runtime components for different architectures can be installed side-by-side... thus enabling cross-compilation. If no, then why doesn't it just work or is the problem a previous version > causing a mess? > According to Kurt, there is no problem. It's all in my head. :)
Re: new buildd dependency resolution breaks self depends?
On Tue, Mar 29, 2011 at 8:03 PM, Kurt Roeckx wrote: > On Tue, Mar 29, 2011 at 07:54:59PM +0200, Wesley W. Terpstra wrote: > > If I may ask, for what purpose do the buildds have a special list of > > packages above and beyond those in unstable? > > So that in case various packages have to be build in an order, > where the seconds depends on the first being available and so on, > that it doesn't take weeks to get them all build. We would have > to wait at least a dinstall before the next one could be build, > assuming sometimes has the time to sign the package between > dinstalls. > > It basicly just avoids a whole lot of delays. > Unfortunately, it seems also to add quite some delays in the self-compiling case. :-/ Each time a buildd finishes, that buildd's Packages file gets updated due to the completed binary upload and all other buildds go back into the BD-Uninstallable state. (I assume this also means the package loses its place in line on the busy buildd queues) I wonder if the same rules applied to the unstable package list (don't include the all for a package whose any is not done) could be applied also to the buildd's Packages?
mlton any->all package transition breakage
Good afternoon. I am the maintainer for the Standard ML 97 (SML) compiler mlton. This compiler is itself written in SML and is self-hosting. Thus, it needs an older version of the compiler in order to bootstrap itself. Further complicating things, the build needs in the ballpark of 1-2GB of physical memory for 32-64-bit architectures, otherwise the build will cause the host machine to swap-till-death. Over the years I have slowly increased the number of supported architectures in debian via a combination of cross-compilation and binary uploads. At the moment, every major debian architecture is supported. Recently I had to prepare a new upload due to the gmp transition and took the opportunity now that squeeze is released to split out the arch-independent components of this monolithic package. Unfortunately, this had unforeseen consequences on the buildd system. The problem is that the old 'any' package (20100608-2) got removed from unstable before the new package's (20100608-3) buildd runs completed; only the amd64 buildd was fast enough. I am not entirely clear on the cause, but the consequence is clear enough: the buildds can no longer install the old version of mlton needed to bootstrap the new version. It has been proposed to me to manually rebuild the package on every debian architecture and then binary upload the result. To that end, I request installation in a sid chroot these packages from unstable: libgmp-dev htmldoc texlive-latex-base procps debhelper cdbs quilt joe. Additionally, please install from squeeze (should still install cleanly in sid chroot) the package: mlton. I request the above packages to be installed on these machines: albeniz.debian.org alpha 8g y abel.debian.org armel 1.5gy merulo.debian.org ia648g y asdfasdf.debian.net kfreebsd-amd64 2g y ad...@asdfasdf.debian.net io.debian.net kfreebsd-i386 1.5gy ad...@io.debian.net gabrielli.debian.orgmips1.6gy zelenka.debian.org s3901g y smetana.debian.org sparc 2g y Unfortunately, a look over the currently available porterboxes shows that not every architecture can be fixed this way: paer does not have a sid chroot, strauss has insufficient memory, mipsel has no porterbox at all, and pescetti has insufficient memory. This means that I cannot rebuild the package for: hppa, hurd-i386, mipsel, powerpc. The available buildd machines *can* rebuild the package on these architectures, but will not do so as long as the old version is missing from unstable. I believe strauss has configurable main memory. If it could be temporarily given 1.5G, then that would solve hurd-i386. Would it be possible to get a sid dchroot setup on paer? If yes, that's another architecture fixed. I am looking for a solution to this build problem for mipsel and powerpc. If the old mlton 'any' package (still in squeeze) were re-added to unstable, that would work (and also render the above package installation requests unnecessary). I'm open to any other suggestions. One option I have considered: by-hand, rip the contents out of the old mlton 'any' package and rebundle the old contents as the "new version" and do a binary upload. This way I could get packages for powerpc and mipsel that would work to properly bootstrap a new upload on the buildds. This is a pretty nasty hack and would mean that the sources do not match the binaries for this one uploaded version, but this might be acceptable as a transitionary step...? Any help appreciated! PS. I could not determine which mailing list is haunted by the ftp-masters. If debian-admin is wrong, please forward it.
Re: mlton any->all package transition breakage
On Fri, Apr 1, 2011 at 9:58 PM, Peter Palfrader wrote: > all the other chroots are now fucked because we did as you asked, and > for some reason it wants to bring in mlton-doc The problem there is you're trying to install and/or upgrade the sid one, not the squeeze one. > I think we won't be doing anything like that again any time soon. Feel free to purge whatever mlton package you have installed. I've been fairly successful building the package from installing the .debs in my home directory.
Re: mlton any->all package transition breakage
On Fri, Apr 1, 2011 at 10:47 PM, Peter Palfrader wrote: > > The problem there is you're trying to install and/or upgrade the sid one, > > not the squeeze one. > > No, the squeeze package installed cleanly. now apt-get update && > upgrade breaks. That means the package is buggy. > Yes, I know it's buggy. That's why I'm trying to fix it. It is missing a replaces/breaks, which leads to upgrade problems. However, before I can upload a new version that fixes that, I need to get a working bootstrap version that the buildd will actually install.