On Tue, Jun 29, 2010 at 11:15 AM, Jeff Kilpatrick <kilpatrick.j...@gmail.com > wrote:
> Hey Fergus. > > You are correct about the "another problem which may happen". I applied > the fix you suggested, and set the temp_o and temp_i back to orig_output and > orig_input through the dcc_set_output() calls, and I am now getting > consistent checksums. I will be doing builds all through the afternoon to > confirm checksums match every single time. > > Thank you all so very much. You have literally saved us thousands of hours > in compile time, per week. > > -Jeff > > My changes: > > serve.c: > > if (cpp_where == DCC_CPP_ON_SERVER) { > if (dcc_r_many_files(in_fd, temp_dir, compr) > // || dcc_set_output(argv, temp_o) > || dcc_set_output(argv, orig_output) > || tweak_arguments_for_server(argv, temp_dir, deps_fname, > &dotd_target, &tweaked_argv)) > goto out_cleanup; > > if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr)) > || (ret = dcc_set_input(argv, orig_input)) > || (ret = dcc_set_output(argv, orig_output))) > > // || (ret = dcc_set_input(argv, temp_i)) > // || (ret = dcc_set_output(argv, temp_o))) > goto out_cleanup; When posting patches to the mailing list, please use "svn diff" or "diff -u". If that's all you've changed, I don't think your patch is correct. You'd need to also update the code which sends the object file back to the client: if ((ret = dcc_x_file(out_fd, temp_o, "DOTO", compr, NULL))) goto out_cleanup; Also, I think your change may cause problems in non-pump mode if two different clients attempt to compile the same object file at the same time. Cheers, Fergus. > > On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <fer...@google.com>wrote: > >> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick < >> kilpatrick.j...@gmail.com> wrote: >> >>> Yes, I have tried both pump and regular mode, and both behave the same >>> way. >>> >> >> Well, I don't think it is exactly the same way. In the non-pump case, >> distcc does the preprocessing locally, sends the ".ii" file to the server, >> and the server then invokes gcc with the name of the ".ii" file, e.g. >> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object >> file. >> In the pump case, the source file names used on the server are the same as >> the source file names used on the client, so the problem in your original >> email won't happen in that case. >> >> But there is another problem which may happen in both cases: >> distcc changes the command line on the server to use a different object >> file name, e.g. "-o ./tmp/distccd_ac31c96a.o", >> and gcc may embed the name of the object file in the object file. >> In the non-pump case, this changing of the object file name is needed to >> ensure that two different distcc invocations on the same server don't try to >> write to the same file. >> But in the pump case, where the compilation is being invoked in a >> temporary directory, I don't think it is actually necessary to change the >> object file name... >> I think the code to do that has just been inherited for historical reasons >> from the non-pump case. >> So it may be possible to modify distcc to avoid doing that in the pump >> case. >> The code which changes the object file name is in the dcc_run_job() >> function in src/serve.c (look in particular for the calls to >> dcc_set_output(), but other parts of the function would need modification >> too). >> But I guess if you're not going to be using pump mode, that wouldn't help >> you. >> >> You may find that the object files are more deterministic if you don't >> pass the "-g" flag to the compiler. >> >> Cheers, >> Fergus. >> >> >>> A lot of the projects that I will be compiling include boost, and I >>> believe that the pump fails on those, and falls back to regular mode. >>> >>> -Jeff >>> >>> >>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson <fer...@google.com>wrote: >>> >>>> Did you try using pump mode? >>>> That should give you a better build speed-up and may also avoid this >>>> issue. >>>> >>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.j...@gmail.com> >>>> wrote: >>>> > Oops, my original response went directly to Ihar, rather than to the >>>> list. >>>> > >>>> > ---- >>>> > >>>> > >>>> > >>>> > Thank you for your response. >>>> > >>>> > We do have a tool internally that could 'scrub' the object file of its >>>> > dynamic symbols, and could be adapted for this purpose. However, I'm >>>> > hesitant to modify anything with the .o and .so with an external tool, >>>> as in >>>> > some cases, it may be hiding a legitimate issue. Once an exception >>>> makes it >>>> > into the code, its tempting to continue adding exceptions to fix >>>> issues. >>>> > Before you know it, you have 600 branches with unique 'fixes' to them >>>> :) >>>> > >>>> > Once we get a consistent checksum on the .o and .so files, they'll be >>>> > packaged into a .iso, which will also need to be repeatable. This can >>>> be >>>> > challenging as well, since attributes on the files can affect the >>>> final >>>> > checksum. >>>> > >>>> > -Jeff >>>> > >>>> > >>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau < >>>> > thephil...@gmail.com> wrote: >>>> > >>>> >> Hi Jeff! >>>> >> >>>> >> You can try to collect the check-sum only for the ELF segments which >>>> are >>>> >> actually derived from the the source code, omitting the segments with >>>> the >>>> >> extra compiler's info. I do not know any ready tool for the purpose, >>>> but >>>> >> coding something like this - print on stdout all segments except the >>>> >> black-listed - shouldn't be too complicated. >>>> >> >>>> >> >>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick < >>>> >> kilpatrick.j...@gmail.com> wrote: >>>> >> >>>> >>> Thank you for your response. >>>> >>> >>>> >>> Yes, this is the only difference in the object file. We've taken >>>> great >>>> >>> pains over the last few years, removing anything that would cause >>>> checksums >>>> >>> to mismatch. >>>> >>> >>>> >>> I will do some research myself, and talk to a few developers to see >>>> if >>>> >>> they can help me. >>>> >>> >>>> >>> Thanks >>>> >>> -Jeff >>>> >>> >>>> >>> >>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <m...@sourcefrog.net> >>>> wrote: >>>> >>> >>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <kilpatrick.j...@gmail.com> >>>> >>>> wrote: >>>> >>>> > Hello, >>>> >>>> > >>>> >>>> > At my work, we've just begun to investigate how much of an impact >>>> that >>>> >>>> > distcc will have on our builds. >>>> >>>> > >>>> >>>> > We typically perform 200 builds a week, ranging from a thousand >>>> lines >>>> >>>> of >>>> >>>> > code, up to 600,000 lines of code each. Our back end build >>>> scripts are >>>> >>>> based >>>> >>>> > on python, and use Linux make to build. We are running VMWare >>>> images on >>>> >>>> a >>>> >>>> > blade cluster, and each of our three new build servers have 20Ghz >>>> >>>> processing >>>> >>>> > power, with 4G of RAM. Our primary build environments are loop >>>> back >>>> >>>> ISOs, >>>> >>>> > from a central CIFS server, and are unioned together with >>>> unionfs. Our >>>> >>>> > source code is then copied into this environment, and we proceed >>>> with >>>> >>>> our >>>> >>>> > build, using chroot to enter our build environment. Our 'distcc' >>>> >>>> machines >>>> >>>> > use the same loop back system, with only our OS and distcc being >>>> >>>> accessible. >>>> >>>> >>>> >>>> That's pretty cool. >>>> >>>> >>>> >>>> > One of the most important things for our builds, due to the >>>> market that >>>> >>>> we >>>> >>>> > are in, is that our builds must be reproducible, with repeatable >>>> >>>> md5sums on >>>> >>>> > our shared objects, based on the same label and same >>>> dependencies. In >>>> >>>> our >>>> >>>> > recent tests, we were able to take a particular build from 24 >>>> minutes >>>> >>>> to 14 >>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our >>>> VMs. >>>> >>>> > However, when performing an md5sum on our final shared objects / >>>> object >>>> >>>> > files, the checksums change every build. We dropped down to just >>>> using >>>> >>>> g++ >>>> >>>> > to perform our linking, all locally, but our object files are >>>> still >>>> >>>> > mismatching. >>>> >>>> > >>>> >>>> > In the object files' `objdump -s` output, it appears that an >>>> entry is >>>> >>>> being >>>> >>>> > made into all our object files with the following syntax >>>> >>>> "distccd_XXXXX", >>>> >>>> > with XXXXX being a seemingly random combination of characters. >>>> >>>> >>>> >>>> Hi Jeff, >>>> >>>> >>>> >>>> I think this is coming from gcc recording the input file name in >>>> the >>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on the >>>> >>>> server. >>>> >>>> >>>> >>>> > In the same object file, compiled locally without distcc, we get >>>> a >>>> >>>> rather >>>> >>>> > generic <built-in> placeholder. >>>> >>>> >>>> >>>> I think this means it's coming from the builtin preprocessor. >>>> >>>> >>>> >>>> I probably won't have time to work on this myself but if you have a >>>> >>>> programmer interested in it there are two possible avenues: >>>> >>>> >>>> >>>> - make gcc read from a file called <built-in> in a temporary >>>> subdirectory >>>> >>>> >>>> >>>> - find some way to stop it recording the compiler input file name >>>> >>>> >>>> >>>> Is that the only difference in the object files? It's pretty common >>>> >>>> for compilers to also record something about the time the >>>> compilation >>>> >>>> was run or for source files to build this in, which would mean they >>>> >>>> change every time. >>>> >>>> >>>> >>>> > >>>> >>>> > I've reviewed the source code for distcc, and seen a few >>>> references to >>>> >>>> this >>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am >>>> at a >>>> >>>> loss on >>>> >>>> > how to further troubleshoot this, or even if its possible to get >>>> >>>> consistent >>>> >>>> > checksums with distcc. >>>> >>>> > >>>> >>>> > >>>> >>>> > Versions >>>> >>>> > ======= >>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2 >>>> >>>> > >>>> >>>> > distcc 3.1 i686-pc-linux-gnu >>>> >>>> > (protocols 1, 2 and 3) (default port 3632) >>>> >>>> > built Mar 29 2010 10:55:35 >>>> >>>> > >>>> >>>> > Kernel: 2.6.9-89.ELsmp >>>> >>>> > >>>> >>>> > Command being issued: >>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc" >>>> >>>> > >>>> >>>> > Here's the partial output of objdump -s: >>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr >>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT >>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef >>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp >>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3 >>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_ >>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b >>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i >>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp >>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp >>>> >>>> > >>>> >>>> > Thank you for reviewing my issue. >>>> >>>> > >>>> >>>> > -Jeff >>>> >>>> > >>>> >>>> > __ >>>> >>>> > distcc mailing list http://distcc.samba.org/ >>>> >>>> > To unsubscribe or change options: >>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc >>>> >>>> > >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Martin >>>> >>>> >>>> >>> >>>> >>> >>>> >>> __ >>>> >>> distcc mailing list http://distcc.samba.org/ >>>> >>> To unsubscribe or change options: >>>> >>> https://lists.samba.org/mailman/listinfo/distcc >>>> >>> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Don't walk behind me, I may not lead. >>>> >> Don't walk in front of me, I may not follow. >>>> >> Just walk beside me and be my friend. >>>> >> -- Albert Camus (attributed to) >>>> >> >>>> >>> >>> >> >> >> -- >> Fergus Henderson <fer...@google.com> >> > > -- Fergus Henderson <fer...@google.com>
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc