You are correct; I did miss some spots that I've copied into this email. I've done a few more changes locally, and am continuing my testing.
The code paste wasn't intended to be a patch. I'm a non-programmer (integration engineering only), so I wouldn't think any of my changes would be up to par for an official submit. -Jeff On Tue, Jun 29, 2010 at 12:50 PM, Fergus Henderson <fer...@google.com>wrote: > > On Tue, Jun 29, 2010 at 11:15 AM, Jeff Kilpatrick < > kilpatrick.j...@gmail.com> wrote: > >> Hey Fergus. >> >> You are correct about the "another problem which may happen". I applied >> the fix you suggested, and set the temp_o and temp_i back to orig_output and >> orig_input through the dcc_set_output() calls, and I am now getting >> consistent checksums. I will be doing builds all through the afternoon to >> confirm checksums match every single time. >> >> Thank you all so very much. You have literally saved us thousands of hours >> in compile time, per week. >> >> -Jeff >> >> My changes: >> >> serve.c: >> >> if (cpp_where == DCC_CPP_ON_SERVER) { >> if (dcc_r_many_files(in_fd, temp_dir, compr) >> // || dcc_set_output(argv, temp_o) >> || dcc_set_output(argv, orig_output) >> || tweak_arguments_for_server(argv, temp_dir, deps_fname, >> &dotd_target, &tweaked_argv)) >> goto out_cleanup; >> >> if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr)) >> || (ret = dcc_set_input(argv, orig_input)) >> || (ret = dcc_set_output(argv, orig_output))) >> >> // || (ret = dcc_set_input(argv, temp_i)) >> // || (ret = dcc_set_output(argv, temp_o))) >> goto out_cleanup; > > > When posting patches to the mailing list, please use "svn diff" or "diff > -u". > If that's all you've changed, I don't think your patch is correct. > You'd need to also update the code which sends the object file back to the > client: > > if ((ret = dcc_x_file(out_fd, temp_o, "DOTO", compr, NULL))) > goto out_cleanup; > > Also, I think your change may cause problems in non-pump mode if two > different clients attempt to compile the same object file at the same time. > > Cheers, > Fergus. > > >> >> On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <fer...@google.com>wrote: >> >>> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick < >>> kilpatrick.j...@gmail.com> wrote: >>> >>>> Yes, I have tried both pump and regular mode, and both behave the same >>>> way. >>>> >>> >>> Well, I don't think it is exactly the same way. In the non-pump case, >>> distcc does the preprocessing locally, sends the ".ii" file to the server, >>> and the server then invokes gcc with the name of the ".ii" file, e.g. >>> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the object >>> file. >>> In the pump case, the source file names used on the server are the same >>> as the source file names used on the client, so the problem in your original >>> email won't happen in that case. >>> >>> But there is another problem which may happen in both cases: >>> distcc changes the command line on the server to use a different object >>> file name, e.g. "-o ./tmp/distccd_ac31c96a.o", >>> and gcc may embed the name of the object file in the object file. >>> In the non-pump case, this changing of the object file name is needed to >>> ensure that two different distcc invocations on the same server don't try to >>> write to the same file. >>> But in the pump case, where the compilation is being invoked in a >>> temporary directory, I don't think it is actually necessary to change the >>> object file name... >>> I think the code to do that has just been inherited for historical >>> reasons from the non-pump case. >>> So it may be possible to modify distcc to avoid doing that in the pump >>> case. >>> The code which changes the object file name is in the dcc_run_job() >>> function in src/serve.c (look in particular for the calls to >>> dcc_set_output(), but other parts of the function would need modification >>> too). >>> But I guess if you're not going to be using pump mode, that wouldn't help >>> you. >>> >>> You may find that the object files are more deterministic if you don't >>> pass the "-g" flag to the compiler. >>> >>> Cheers, >>> Fergus. >>> >>> >>>> A lot of the projects that I will be compiling include boost, and I >>>> believe that the pump fails on those, and falls back to regular mode. >>>> >>>> -Jeff >>>> >>>> >>>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson >>>> <fer...@google.com>wrote: >>>> >>>>> Did you try using pump mode? >>>>> That should give you a better build speed-up and may also avoid this >>>>> issue. >>>>> >>>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <kilpatrick.j...@gmail.com> >>>>> wrote: >>>>> > Oops, my original response went directly to Ihar, rather than to the >>>>> list. >>>>> > >>>>> > ---- >>>>> > >>>>> > >>>>> > >>>>> > Thank you for your response. >>>>> > >>>>> > We do have a tool internally that could 'scrub' the object file of >>>>> its >>>>> > dynamic symbols, and could be adapted for this purpose. However, I'm >>>>> > hesitant to modify anything with the .o and .so with an external >>>>> tool, as in >>>>> > some cases, it may be hiding a legitimate issue. Once an exception >>>>> makes it >>>>> > into the code, its tempting to continue adding exceptions to fix >>>>> issues. >>>>> > Before you know it, you have 600 branches with unique 'fixes' to them >>>>> :) >>>>> > >>>>> > Once we get a consistent checksum on the .o and .so files, they'll be >>>>> > packaged into a .iso, which will also need to be repeatable. This can >>>>> be >>>>> > challenging as well, since attributes on the files can affect the >>>>> final >>>>> > checksum. >>>>> > >>>>> > -Jeff >>>>> > >>>>> > >>>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau < >>>>> > thephil...@gmail.com> wrote: >>>>> > >>>>> >> Hi Jeff! >>>>> >> >>>>> >> You can try to collect the check-sum only for the ELF segments which >>>>> are >>>>> >> actually derived from the the source code, omitting the segments >>>>> with the >>>>> >> extra compiler's info. I do not know any ready tool for the purpose, >>>>> but >>>>> >> coding something like this - print on stdout all segments except the >>>>> >> black-listed - shouldn't be too complicated. >>>>> >> >>>>> >> >>>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick < >>>>> >> kilpatrick.j...@gmail.com> wrote: >>>>> >> >>>>> >>> Thank you for your response. >>>>> >>> >>>>> >>> Yes, this is the only difference in the object file. We've taken >>>>> great >>>>> >>> pains over the last few years, removing anything that would cause >>>>> checksums >>>>> >>> to mismatch. >>>>> >>> >>>>> >>> I will do some research myself, and talk to a few developers to see >>>>> if >>>>> >>> they can help me. >>>>> >>> >>>>> >>> Thanks >>>>> >>> -Jeff >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <m...@sourcefrog.net> >>>>> wrote: >>>>> >>> >>>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick <kilpatrick.j...@gmail.com >>>>> > >>>>> >>>> wrote: >>>>> >>>> > Hello, >>>>> >>>> > >>>>> >>>> > At my work, we've just begun to investigate how much of an >>>>> impact that >>>>> >>>> > distcc will have on our builds. >>>>> >>>> > >>>>> >>>> > We typically perform 200 builds a week, ranging from a thousand >>>>> lines >>>>> >>>> of >>>>> >>>> > code, up to 600,000 lines of code each. Our back end build >>>>> scripts are >>>>> >>>> based >>>>> >>>> > on python, and use Linux make to build. We are running VMWare >>>>> images on >>>>> >>>> a >>>>> >>>> > blade cluster, and each of our three new build servers have >>>>> 20Ghz >>>>> >>>> processing >>>>> >>>> > power, with 4G of RAM. Our primary build environments are loop >>>>> back >>>>> >>>> ISOs, >>>>> >>>> > from a central CIFS server, and are unioned together with >>>>> unionfs. Our >>>>> >>>> > source code is then copied into this environment, and we proceed >>>>> with >>>>> >>>> our >>>>> >>>> > build, using chroot to enter our build environment. Our 'distcc' >>>>> >>>> machines >>>>> >>>> > use the same loop back system, with only our OS and distcc being >>>>> >>>> accessible. >>>>> >>>> >>>>> >>>> That's pretty cool. >>>>> >>>> >>>>> >>>> > One of the most important things for our builds, due to the >>>>> market that >>>>> >>>> we >>>>> >>>> > are in, is that our builds must be reproducible, with repeatable >>>>> >>>> md5sums on >>>>> >>>> > our shared objects, based on the same label and same >>>>> dependencies. In >>>>> >>>> our >>>>> >>>> > recent tests, we were able to take a particular build from 24 >>>>> minutes >>>>> >>>> to 14 >>>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our >>>>> VMs. >>>>> >>>> > However, when performing an md5sum on our final shared objects / >>>>> object >>>>> >>>> > files, the checksums change every build. We dropped down to just >>>>> using >>>>> >>>> g++ >>>>> >>>> > to perform our linking, all locally, but our object files are >>>>> still >>>>> >>>> > mismatching. >>>>> >>>> > >>>>> >>>> > In the object files' `objdump -s` output, it appears that an >>>>> entry is >>>>> >>>> being >>>>> >>>> > made into all our object files with the following syntax >>>>> >>>> "distccd_XXXXX", >>>>> >>>> > with XXXXX being a seemingly random combination of characters. >>>>> >>>> >>>>> >>>> Hi Jeff, >>>>> >>>> >>>>> >>>> I think this is coming from gcc recording the input file name in >>>>> the >>>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on >>>>> the >>>>> >>>> server. >>>>> >>>> >>>>> >>>> > In the same object file, compiled locally without distcc, we get >>>>> a >>>>> >>>> rather >>>>> >>>> > generic <built-in> placeholder. >>>>> >>>> >>>>> >>>> I think this means it's coming from the builtin preprocessor. >>>>> >>>> >>>>> >>>> I probably won't have time to work on this myself but if you have >>>>> a >>>>> >>>> programmer interested in it there are two possible avenues: >>>>> >>>> >>>>> >>>> - make gcc read from a file called <built-in> in a temporary >>>>> subdirectory >>>>> >>>> >>>>> >>>> - find some way to stop it recording the compiler input file name >>>>> >>>> >>>>> >>>> Is that the only difference in the object files? It's pretty >>>>> common >>>>> >>>> for compilers to also record something about the time the >>>>> compilation >>>>> >>>> was run or for source files to build this in, which would mean >>>>> they >>>>> >>>> change every time. >>>>> >>>> >>>>> >>>> > >>>>> >>>> > I've reviewed the source code for distcc, and seen a few >>>>> references to >>>>> >>>> this >>>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am >>>>> at a >>>>> >>>> loss on >>>>> >>>> > how to further troubleshoot this, or even if its possible to get >>>>> >>>> consistent >>>>> >>>> > checksums with distcc. >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > Versions >>>>> >>>> > ======= >>>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2 >>>>> >>>> > >>>>> >>>> > distcc 3.1 i686-pc-linux-gnu >>>>> >>>> > (protocols 1, 2 and 3) (default port 3632) >>>>> >>>> > built Mar 29 2010 10:55:35 >>>>> >>>> > >>>>> >>>> > Kernel: 2.6.9-89.ELsmp >>>>> >>>> > >>>>> >>>> > Command being issued: >>>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc" >>>>> >>>> > >>>>> >>>> > Here's the partial output of objdump -s: >>>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr >>>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT >>>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef >>>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp >>>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3 >>>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_ >>>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b >>>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i >>>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp >>>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp >>>>> >>>> > >>>>> >>>> > Thank you for reviewing my issue. >>>>> >>>> > >>>>> >>>> > -Jeff >>>>> >>>> > >>>>> >>>> > __ >>>>> >>>> > distcc mailing list http://distcc.samba.org/ >>>>> >>>> > To unsubscribe or change options: >>>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc >>>>> >>>> > >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> -- >>>>> >>>> Martin >>>>> >>>> >>>>> >>> >>>>> >>> >>>>> >>> __ >>>>> >>> distcc mailing list http://distcc.samba.org/ >>>>> >>> To unsubscribe or change options: >>>>> >>> https://lists.samba.org/mailman/listinfo/distcc >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Don't walk behind me, I may not lead. >>>>> >> Don't walk in front of me, I may not follow. >>>>> >> Just walk beside me and be my friend. >>>>> >> -- Albert Camus (attributed to) >>>>> >> >>>>> >>>> >>>> >>> >>> >>> -- >>> Fergus Henderson <fer...@google.com> >>> >> >> > > > -- > Fergus Henderson <fer...@google.com> >
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc