On 01/21/11 20:27, Roman Divacky wrote:
This patch does three things:

1) emits "call .mcount" at the begining of every function body


The differences on i386 between profiled and non-profiled code are not
as obvious as with gcc (using diff on assembly output), but on first
inspection it looks correct.

cool :)

2) changes the driver to link in gcrt1.o instead of crt1.o

3) changes all -lfoo to -lfoo_p except when the foo ends with _s in
    the linker invocation


Maybe it is wise to follow the gcc implementation here.

ok, makes sense

I am not sure that I did the right thing, especially in (3). Anyway,
the patch works for me (ie. produces a.out.gmon that seems to contain
meaningful data).

I would appreciate if you guys could test and review this. Letting me
know if this is correct.


On both my systems (i386 and amd64) something goes severely wrong when
linking several objects (all compiled with -pg, this is amd64):

Perhaps the invocation of the linker still needs some work (or I must
redo my installation) but anyhow it looks like a good job. Thanks!

I rewrote the libraries rewriting part to match gcc as close as possible.
I also think that I solved your ld problem..


please revert the old patch and test the new one:

         http://lev.vlakno.cz/~rdivacky/clang-gprof.patch

I believe this one is ok (works for me just fine), please test and report
back so I can start integrating this upstream.


I performed a few quick tests on both i386 and amd64.

The problems I had with the invocation of ld appear to be solved. The behavior with respect to libraries is now identical to gcc as far I can see.

The results from gprof also look very promising. For my test program on amd64 the gprof output when using clang is

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 42.5       4.22     4.22        0  100.00%           _mcount [5]
 22.0       6.41     2.18 14700000     0.00     0.00  f_timint [6]
 12.4       7.64     1.23 21900000     0.00     0.00  exp [10]
  8.4       8.48     0.84 22000000     0.00     0.00  vmol [9]
  5.4       9.02     0.54  6300000     0.00     0.00  f_angle [11]
  3.8       9.40     0.38        0  100.00%           .mcount (52)
  1.9       9.59     0.19  1000000     0.00     0.01  qk21 [4]
  1.9       9.78     0.19  1000000     0.00     0.00  pow [12]
  0.4       9.82     0.04   200000     0.00     0.03  qags [3]
  0.4       9.86     0.04   100000     0.00     0.00  zero [14]
  0.3       9.89     0.03   100000     0.00     0.00  qext [16]
  0.2       9.91     0.02   800000     0.00     0.00  f_apsis [15]
  0.1       9.91     0.01  2500000     0.00     0.00  fmax [17]
  0.1       9.92     0.01   100000     0.00     0.00  apsis [13]
  0.0       9.92     0.00  1000000     0.00     0.00  fmin [18]
  0.0       9.93     0.00   100000     0.00     0.03  timint [7]
  0.0       9.93     0.00   700000     0.00     0.00  tol_apsis [19]
  0.0       9.94     0.00   200000     0.00     0.00  sort [20]
  0.0       9.94     0.00        1     1.85  5334.52  main [1]
  0.0       9.94     0.00   100000     0.00     0.03  angle [8]
...

while using gcc yields

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 44.3       4.23     4.23        0  100.00%           _mcount [5]
 18.5       6.00     1.76 14700000     0.00     0.00  f_timint [6]
 13.5       7.28     1.28 21900000     0.00     0.00  exp [10]
  9.0       8.14     0.86 22000000     0.00     0.00  vmol [9]
  5.5       8.66     0.52  6300000     0.00     0.00  f_angle [11]
  4.0       9.04     0.38        0  100.00%           .mcount (52)
  2.0       9.24     0.19  1000000     0.00     0.00  pow [12]
  2.0       9.43     0.19  1000000     0.00     0.00  qk21 [4]
  0.3       9.45     0.03   100000     0.00     0.00  zero [14]
  0.3       9.48     0.03   200000     0.00     0.02  qags [3]
  0.2       9.50     0.02   100000     0.00     0.00  qext [16]
  0.2       9.52     0.02   800000     0.00     0.00  f_apsis [15]
  0.1       9.53     0.00  2500000     0.00     0.00  fmax [17]
  0.0       9.53     0.00   700000     0.00     0.00  tol_apsis [18]
  0.0       9.53     0.00   200000     0.00     0.00  sort [19]
  0.0       9.54     0.00   100000     0.00     0.00  apsis [13]
  0.0       9.54     0.00        1     2.21  4927.66  main [1]
  0.0       9.54     0.00  1000000     0.00     0.00  fmin [20]
  0.0       9.54     0.00   100000     0.00     0.02  timint [7]
  0.0       9.54     0.00   100000     0.00     0.02  angle [8]
...

To me this looks quite similar 8-)

I also tested the interaction of -pg with other options and there I found an issue with -fomit-frame-pointer. Here gcc bails out, as it probably should:

gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c
gcc: -pg and -fomit-frame-pointer are incompatible

while clang continues and silently generates an executable that immediately terminates with a segmentation violation when started.

Another minor, unrelated issue I found is that this version of clang on i386 generates ssse2 instruction by default, while gcc and clang in -CURRENT generate the "classical" i387 instructions.

Kind regards,

Hans Ottevanger
_______________________________________________
freebsd-toolchain@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"

Reply via email to