Re: distcc for pkgsrc issue

2017-07-13 Thread John Halfpenny
> The question is, since folks make heavy use of distcc, does it have 
the same limitations

I'd expect it to have. 

I've not done any compiling in anger with this yet since I want to try 
it with multiple helper machines, but will gladly share experiences 
when I do so. 

You mentioned Firefox and QT; if I understand correctly, these are 
problematic so will add them to my list of packages if the info would 
be useful.  
 
-- 
j...@sdf.org
SDF Public Access UNIX System - http://sdf.org


Re: distcc for pkgsrc issue

2017-07-08 Thread Greg Troxel

Swift Griggs  writes:

> On Fri, 7 Jul 2017, John Halfpenny wrote:
>> Just an update for posterity that I resolved this issue.
>
> Interesting. The wrapper script idea reminds me of another question
> about distcc and friends. I've noticed that some packages complain
> with great aggrevation about my use of "make -jX" where X=CPUs.

Presumably you mean "build fails in ways that are hard to reproduce, and
we think it's because of makefile bugs where dependencies that actually
exist are not expressed in the makefile, but with make -j1 no one
notices"?  If so, yes, that's how it is...

> The question is, since folks make heavy use of distcc, does it have
> the same limitations and when you hit a compilation error related to a
> parallel compiler run, is that how the "dont-use-parallel-make"
> warnings get there, or are the mechanics of 'make -j8' and distcc so
> different that errors in one doesn't mean problems with the other?

They are basically orthogonal, except that actually hitting a "make -j"
bug is probabalistic.

-j8 lets make have 8 jobs running, and doesn't change the compiler.

distcc says that instead of calling cc locally, one essentially does rsh*
to some other box to run cc, after sending the input, and then gets the
output.  There is no explicit relationship to job number.

* I actually use ssh with a control socket and a 'ssh target sleep
  86400" running, so the subsequent ssh commands are fast.

Overall, it's good to keep all cpus busy without hammering the disk any
more than you need to, so I tend to use -jN where N is 1.5x the cpus,
for local.  With distcc, there is latency shipping the jobs back and
forth, so I tend to go even higher, perhaps -j12 when I am using (only)
a remote 4-core machine.  Basically I recommend looking at CPU
utilization on the build box when compiling something that is really
parallelizable and finding the smallest -jN value that results in
sustained 90%+ or so loading, avoiding driving the load average to more
than about 1.5x the CPU count.  Yes, I know that's very handwavy.

With timing and number of jobs different, I would expect a different
subset of latent bugs to actually show up.

> The reason I'm asking is occasionally I'd use distcc to get a few of
> my faster NetBSD boxes compiling things like QT or Firefox in
> something less than 60 minutes. It's just that I've never set it up
> because of the many failures I've had trying to use make -j ...

I hope you are using

MAKE_JOBS=8

rather than something else.  Assuming so, note that there is a package
variable "MAKE_JOBS_SAFE", which is supposed to be unset normally and
"no" when a package is known to have a bug building with more than one
job (regardless of whether we know what the bug is).

So if you find something that fails, try with different -j values,
especially 1, and also try restarting the build after failure.  If you
can convince yourself there's make-j bug, post your logic and someone
can stick in MAKE_JOBS_SAFE=no to that package.


signature.asc
Description: PGP signature


Re: distcc for pkgsrc issue

2017-07-07 Thread Swift Griggs

On Fri, 7 Jul 2017, John Halfpenny wrote:

Just an update for posterity that I resolved this issue.


Interesting. The wrapper script idea reminds me of another question about 
distcc and friends. I've noticed that some packages complain with great 
aggrevation about my use of "make -jX" where X=CPUs.


The question is, since folks make heavy use of distcc, does it have the 
same limitations and when you hit a compilation error related to a 
parallel compiler run, is that how the "dont-use-parallel-make" warnings 
get there, or are the mechanics of 'make -j8' and distcc so different that 
errors in one doesn't mean problems with the other?


The reason I'm asking is occasionally I'd use distcc to get a few of my 
faster NetBSD boxes compiling things like QT or Firefox in something less 
than 60 minutes. It's just that I've never set it up because of the many 
failures I've had trying to use make -j ...


-Swift


Re: distcc for pkgsrc issue

2017-07-07 Thread John Halfpenny
Just an update for posterity that I resolved this issue.

Following on from my previous email, I included the architecture flag when 
building the NetBSD toolchain on Linux (debatable if this alteration was 
required):

./build.sh -a i386 -m i386 -T /usr/gcc-cross-i386/ tools

I also included the wrong path in my distcc startup script, the correct 
location was the location of the netbsdelf c compilers and friends (e.g. 
i486--netbsdelf-gcc).

However, the biggest clue came from this webpage where someone had kindly 
documented this sort of thing before: https://hackaday.io/post/339. In this 
post, the author recommends creating small wrapper scripts for each compilation 
tool (e.g. cc, g++) which look like this:

cc:

#!/bin/sh
exec /usr/gcc-cross-i386/bin/i486--netbsdelf-gcc "$@"

I also had to remove ccache from my mk.conf, so the compiler line became:

PKGSRC_COMPILER=distcc gcc

Once all these were done, worked fine. Might help someone in future and there 
is now one less orphaned issue in the world.

Best
John

-- 
j...@sdf.org
SDF Public Access UNIX System - http://sdf.org


Re: distcc for pkgsrc issue

2017-06-30 Thread John Halfpenny
Yeah you're right. I think my debian box isn't using the NetBSD tools. 
I'm going to have to make some time to sort that end out properly first, 
work out why systemd isn't starting distcc up properly etc.

-- 
j...@sdf.org
SDF Public Access UNIX System - http://sdf.org


Re: distcc for pkgsrc issue

2017-06-30 Thread maya
On Fri, Jun 30, 2017 at 07:32:54AM +, John Halfpenny wrote:
>   # file test.o
>   test.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
> 

I think that's the output it gives for a linux object file. compare:
$ GOOS=linux go build hello.go; file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, 
not stripped, with debug_info
$ GOOS=netbsd go build hello.go; file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (NetBSD), statically 
linked, for NetBSD 5.99, not stripped, with debug_info


Re: distcc for pkgsrc issue

2017-06-30 Thread John Halfpenny
  [distcc woes]

  [try a simple C program?]

Thanks for taking time to reply, Greg.

I also see the same error as before when adding

  USE_CWRAPPERS=no
  
to mk.conf

But I made a little time to try a simple C program and this has pointed 
me in the right direction:

  # export DISTCC_HOSTS='de.bi.an.pc' 
  # /usr/pkg/bin/distcc gcc -c test.c -o test.o

...here I get a message on the debian pc that the compile was ok, but 
the NetBSD box reports:

  # gcc test.o -o test
  test.o: file not recognized: File format not recognized

However, I noticed the debian machine seems to be generating 64-bit 
object files (the NetBSD box is 32-bit)

  # file test.o
  test.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

This is also true of the pkgsrc object file where it trips up compiling 
ccache:

  # file /usr/pkgsrc/devel/ccache/work/ccache-3.3.4/main.o
  [fileloc]: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

So looks like I need to address that on the Linux end.

Thanks again for the suggestions

John

-- 
j...@sdf.org
SDF Public Access UNIX System - http://sdf.org


Re: distcc for pkgsrc issue

2017-06-29 Thread Greg Troxel

  [distcc woes]

Two thoughts:

  I have seen problems where distcc does not seem work with cwrappers
  enabled.  However, I have not gotten around to really figuring this
  out.  You might try with USE_CWRAPPERS=no.

  Have you tried to build a simple C program with distcc, not using
  pkgsrc?


signature.asc
Description: PGP signature


distcc for pkgsrc issue

2017-06-29 Thread John Halfpenny
Hi all.

I have an old celeron running NetBSD i386 which runs very nicely thankyou. (:

But to save wasting time I'd like to compile pkgsrc programs on a fast, 
multicore Linux machine. This machine runs debian (9/x86_64).

Following https://wiki.netbsd.org/tutorials/pkgsrc/cross_compile_distcc/, I 
installed gcc g++ zlib1g-dev and ncurses-base on my debian machine, and 
downloaded the NetBSD sources

 # cd /root/netbsd-distcc
 # cvs -d anon...@anoncvs.netbsd.org:/cvsroot co -rnetbsd-7-1-RELEASE src

Ran the build script

 # cd src
 # ./build.sh -m i386 tools

This left me with /root/netbsd-distcc/src/obj/tooldir.Linux-4.9.0-3-amd64-x86_64

I then installed distcc, and because systemd failed miserably in starting it 
up, wrote a little script to do that for me:

  #!/bin/csh
  setenv 
PATH=/root/netbsd-distcc/src/obj/tooldir.Linux-4.9.0-3-amd64-x86_64/i486--netbsdelf/bin:$PATH
  setenv DISTCC_VERBOSE 1
  distccd --allow a.b.c.d --nice 5 --jobs 7 --stats 
--log-file=/var/log/distccd.log

Setup the firewall, fired up distcc on Linux. Ok.

Then I went to the NetBSD machine (i386 7.1), installed distcc from pkgsrc, and 
edited /etc/mk.conf to include this:

  PKGSRC_COMPILER=ccache distcc gcc
  MAKE_JOBS=6
  DISTCC_HOSTS=mydebianpc:3632

I tried to compile rxvt in the first instance, but it fails while trying to 
build ccache with

  main.o: file not recognized: File format not recognized
  distcc[7918] ERROR: compile (null) on localhost failed

I can see the builds coming into my debian machine, the logs look like this:

 distccd[31676] (dcc_job_summary) client: a.b.c.d:65411 COMPILE_OK exit:0 sig:0 
core:0 ret:0 time:146ms gcc compopt.c

I'm almost there I can feel it in my bones. Apologies for the long post but I'm 
trying not to miss anything important. Does anyone have any ideas which steps I 
may have missed or done wrong?

Thanks for any pointers
John

-- 
j...@sdf.org
SDF Public Access UNIX System - http://sdf.org