On Thu, 12 Nov 2015, Bernd Schmidt wrote: > > I've run it through make -k check-c regtesting. These are new fails, all > > mysterious: > > These would have to be investigated first.
Any specific suggestions? The PTX code emitted from GCC differs only in prologue/epilogue, so whatever's broken... I think is unlikely due to this change. I can give it another try after upgrading CUDA driver and cuda-gdb from 7.0 to latest. > > + sz = (sz + keep_align - 1) & ~(keep_align - 1); > > Use the ROUND_UP macro. OK, thanks. > > + fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n", > > + bits == 64 ? ".wide" : "", bits); > > Use a shift. I think mul is acceptable here: PTX JIT is handling it properly, according to what I saw while investigating in cuda-gdb. If I used a shift, I'd also have to introduce another instruction for a widening integer conversion in the 64-bit case. Do you insist? > > + > > + if (need_softstack_decl) > > + { > > + fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;"); > > + } > > Lose excess braces. OK. > > +.global .u64 %__softstack[16384]; > > Maybe declarea as .u8 so you don't have two different constants for the stack > size? OK, with ".align 8" to ensure 64-bit alignment. > > + .reg .u64 %stackptr; > > + mov.u64 %stackptr, %__softstack; > > + cvta.global.u64 %stackptr, %stackptr; > > + add.u64 %stackptr, %stackptr, 131072; > > + st.shared.u64 [__nvptx_stacks], %stackptr; > > + > > I'm guessing you have other missing pieces for setting this up for multiple > threads. This is crt0.s, which is linked in only for single-threaded testing with -mmainkernel; for OpenMP, the intention is to handle it in the file that implements libgomp_nvptx_main. Thanks. Alexander