Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

2016-02-15 Thread Mark Millard
On 2016-Feb-15, at 1:20 PM, Mark Millard  wrote:
> 
> On 2016-Feb-15, at 12:18 PM, Roman Divacky  wrote:
>> 
>> On Mon, Feb 15, 2016 at 12:17:50PM -0800, Mark Millard wrote:
>>> On 2016-Feb-15, at 11:11 AM, Roman Divacky  wrote:
 
 Mark, I believe you're right. What do you think about this patch?
 
 Index: tools/clang/lib/CodeGen/TargetInfo.cpp
 ===
 --- tools/clang/lib/CodeGen/TargetInfo.cpp (revision 260852)
 +++ tools/clang/lib/CodeGen/TargetInfo.cpp (working copy)
 @@ -3599,6 +3599,8 @@
 {
   CGF.EmitBlock(UsingOverflow);
 
 +Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
 +
   // Everything in the overflow area is rounded up to a size of at least 4.
   CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
 
 
 Can you test it?
>>> 
>>> It may be later today before I can start the the test process.
>>> 
>>> While your change is not wrong as presented, it does seem to be based on 
>>> the ABI document's numbering with the range 3 <= gr <12, where 3 <= gr < 11 
>>> cover r3-r10 use and gr=11 implies overflow stack area use. (gr being the 
>>> ABI documents name.)
>>> 
>>> The clang code generation that I saw while analyzing the problem and the 
>>> clang source that you had me look at did not use that numbering. Instead it 
>>> seems to be based on 0 <= gpr < 9, where 0 <= gpr < 8 cover r3-r10 use and 
>>> gpr=8 implies overflow stack area use. (gpr being what gdb showed me as I 
>>> remember.) In other words: clang counts the number of "parameter registers" 
>>> already in use as it goes along instead of tracking register numbers that 
>>> have been used.
>>> 
>>> So assigning any value that appears to be positive and >= 8 should work, 
>>> such as:
>>> 
>>> Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);
>> 
>> Can you check what number gcc uses? We want to be interoperable with gcc.
>> 
>> Anyway, thanks for testing!
>> 
>> Roman
> 
> I'll do that check of gcc 4.2.1 code generation before starting the test 
> later today.
> 
> But if the clang numbering is different in gcc 4.2.1 then far more than just 
> adding a
> 
>> Builder.CreateStore(Builder.getInt8(?), NumRegsAddr)
> 
> 
> for some "?" would need to be involved in the changes in order to reach 
> compatibility.
> 
> 
> I'll note that for clang 3.8.0 the actual comparison instruction generated is 
> of the form
> 
>> cmplwi  r?,7
> 
> 
> for some r?, such as r5 or r4, and the conditional branch generated is a bgt 
> instruction.
> 
> ===
> Mark Millard
> markmi at dsl-only.net

gcc 4.2.1 generates comparison instructions for va_arg of the form:

cmplwi  cr7,r0,8

and the conditional branch generated is a "bge cr7, . . ." instruction.

So the same number range is in use by both compilers: They are compatible for 
the bounds checks for reg vs. overflow for how they count, equality 
inclusion/exclusion matching up with the specific number (8 vs. 7) used to make 
things the same overall.

Other aspects of the code generation distinctions would take me time to 
analyze. It will be a while before I will be looking at other points.


===
Mark Millard
markmi at dsl-only.net


___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

2016-02-15 Thread Roman Divacky
On Mon, Feb 15, 2016 at 12:17:50PM -0800, Mark Millard wrote:
> On 2016-Feb-15, at 11:11 AM, Roman Divacky  wrote:
> > 
> > Mark, I believe you're right. What do you think about this patch?
> > 
> > Index: tools/clang/lib/CodeGen/TargetInfo.cpp
> > ===
> > --- tools/clang/lib/CodeGen/TargetInfo.cpp  (revision 260852)
> > +++ tools/clang/lib/CodeGen/TargetInfo.cpp  (working copy)
> > @@ -3599,6 +3599,8 @@
> >   {
> > CGF.EmitBlock(UsingOverflow);
> > 
> > +Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
> > +
> > // Everything in the overflow area is rounded up to a size of at least 
> > 4.
> > CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
> > 
> > 
> > Can you test it?
> 
> It may be later today before I can start the the test process.
> 
> While your change is not wrong as presented, it does seem to be based on the 
> ABI document's numbering with the range 3 <= gr <12, where 3 <= gr < 11 cover 
> r3-r10 use and gr=11 implies overflow stack area use. (gr being the ABI 
> documents name.)
> 
> The clang code generation that I saw while analyzing the problem and the 
> clang source that you had me look at did not use that numbering. Instead it 
> seems to be based on 0 <= gpr < 9, where 0 <= gpr < 8 cover r3-r10 use and 
> gpr=8 implies overflow stack area use. (gpr being what gdb showed me as I 
> remember.) In other words: clang counts the number of "parameter registers" 
> already in use as it goes along instead of tracking register numbers that 
> have been used.
> 
> So assigning any value that appears to be positive and >= 8 should work, such 
> as:
> 
> Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);

Can you check what number gcc uses? We want to be interoperable with gcc.

Anyway, thanks for testing!

Roman
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

2016-02-15 Thread Mark Millard
On 2016-Feb-15, at 11:11 AM, Roman Divacky  wrote:
> 
> Mark, I believe you're right. What do you think about this patch?
> 
> Index: tools/clang/lib/CodeGen/TargetInfo.cpp
> ===
> --- tools/clang/lib/CodeGen/TargetInfo.cpp(revision 260852)
> +++ tools/clang/lib/CodeGen/TargetInfo.cpp(working copy)
> @@ -3599,6 +3599,8 @@
>   {
> CGF.EmitBlock(UsingOverflow);
> 
> +Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
> +
> // Everything in the overflow area is rounded up to a size of at least 4.
> CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
> 
> 
> Can you test it?

It may be later today before I can start the the test process.

While your change is not wrong as presented, it does seem to be based on the 
ABI document's numbering with the range 3 <= gr <12, where 3 <= gr < 11 cover 
r3-r10 use and gr=11 implies overflow stack area use. (gr being the ABI 
documents name.)

The clang code generation that I saw while analyzing the problem and the clang 
source that you had me look at did not use that numbering. Instead it seems to 
be based on 0 <= gpr < 9, where 0 <= gpr < 8 cover r3-r10 use and gpr=8 implies 
overflow stack area use. (gpr being what gdb showed me as I remember.) In other 
words: clang counts the number of "parameter registers" already in use as it 
goes along instead of tracking register numbers that have been used.

So assigning any value that appears to be positive and >= 8 should work, such 
as:

Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);

A cross check on this is the clang source code below:

> llvm::Value *NumRegs = Builder.CreateLoad(NumRegsAddr, "numUsedRegs");
> . . .
> llvm::Value *CC =
> Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");
> 
> llvm::BasicBlock *UsingRegs = CGF.createBasicBlock("using_regs");
> llvm::BasicBlock *UsingOverflow = CGF.createBasicBlock("using_overflow");
> llvm::BasicBlock *Cont = CGF.createBasicBlock("cont");
> 
> Builder.CreateCondBr(CC, UsingRegs, UsingOverflow);

I'd guess that the Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond") 
for using in Builder.CreateCondBr is a test for < 8 (unsigned test?) picking 
UsingRegs and >=8 picking UsingOverflow. 11>=8 so 11 would work.

But the clang folks might prefer that the same figure be used in both places, 
possibly with the source code naming the value once and using the name in both 
places, not that the figure is likely to change in this already PowerPC 
specific code.

In analyzing the powerpc code absent knowledge of clang's code generation 
source code I would likely have been confused by seeing such differing numbers 
in the generated code if I'd run into such. That is another reason to use the 
same figure in both places.


I continue to provide some history below for reference.

===
Mark Millard
markmi at dsl-only.net

On Mon, Feb 15, 2016 at 12:52:15AM -0800, Mark Millard wrote:
> 
> I'm top posting as the following can stand on its own fairly well.
> 
> On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:
> 
>> On 02/14/16 14:34, Mark Millard wrote:
>>> clang's code base is not familiar material for me nor do I have solid 
>>> reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so 
>>> the below has my guess work involved. The following code appears to 
>>> have hard wired a global, unvarying constant (8) into the test for 
>>> picking UsingRegs vs. UsingOverflow.
>> 
>> For reference, we use the standard ELF ABI 
>> (https://uclibc.org/docs/psABI-ppc.pdf).
>> -Nathan
> 
> Reviewing the Parameter Passing material in that document shows that the 
> problem is in the original specification.
> 
> And there is a more modern specification that has a fix in its wording. 
> (Which shows that I'm not likely to be wrong.) I'll reference and quote it 
> later.
> 
> First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft 1995 
> document).
> 
> First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in r3, 
> r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next register 
> to be used, not the last one already used.
> 
> The document splits the algorithm for placement of parameters into 3 stages 
> with the following structure, intended as they have it in the document but 
> various less interesting details for my "8byte then 4byte" example omitted:
> 
>> INITIALIZING:
>> Set fr=1, gr=3, and starg to the address of
>> parameter word 1.
>> SCAN:
>> If there are no more arguments, terminate.
>> Otherwise, select one of the following
>> depending on the type of the next argument:
>> 
>> DOUBLE_OR_FLOAT
>>If fr>8 ( . . .), go to OTHER. Otherwise,
>>. . .
>> 
>> SIMPLE_ARG
>>If gr>10, go to OTHER. Otherwise, load the
>>argument value into general register gr,
>>set gr to gr+1, can goto SCAN. . . .
>> 
>> LONG_LONG
>>If gr>9, go to OTHER. 

Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

2016-02-15 Thread Roman Divacky
Mark, I believe you're right. What do you think about this patch?

Index: tools/clang/lib/CodeGen/TargetInfo.cpp
===
--- tools/clang/lib/CodeGen/TargetInfo.cpp  (revision 260852)
+++ tools/clang/lib/CodeGen/TargetInfo.cpp  (working copy)
@@ -3599,6 +3599,8 @@
   {
 CGF.EmitBlock(UsingOverflow);
 
+Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
+
 // Everything in the overflow area is rounded up to a size of at least 4.
 CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
 

Can you test it?

On Mon, Feb 15, 2016 at 12:52:15AM -0800, Mark Millard wrote:
> 
> I'm top posting as the following can stand on its own fairly well.
> 
> On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:
> 
> > On 02/14/16 14:34, Mark Millard wrote:
> > > clang's code base is not familiar material for me nor do I have solid 
> > > reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so 
> > > the below has my guess work involved. The following code appears to 
> > > have hard wired a global, unvarying constant (8) into the test for 
> > > picking UsingRegs vs. UsingOverflow.
> > 
> > For reference, we use the standard ELF ABI 
> > (https://uclibc.org/docs/psABI-ppc.pdf).
> > -Nathan
> 
> Reviewing the Parameter Passing material in that document shows that the 
> problem is in the original specification.
> 
> And there is a more modern specification that has a fix in its wording. 
> (Which shows that I'm not likely to be wrong.) I'll reference and quote it 
> later.
> 
> First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft 1995 
> document).
> 
> First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in r3, 
> r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next register 
> to be used, not the last one already used.
> 
> The document splits the algorithm for placement of parameters into 3 stages 
> with the following structure, intended as they have it in the document but 
> various less interesting details for my "8byte then 4byte" example omitted:
> 
> > INITIALIZING:
> >  Set fr=1, gr=3, and starg to the address of
> >  parameter word 1.
> > SCAN:
> >  If there are no more arguments, terminate.
> >  Otherwise, select one of the following
> >  depending on the type of the next argument:
> > 
> >  DOUBLE_OR_FLOAT
> > If fr>8 ( . . .), go to OTHER. Otherwise,
> > . . .
> > 
> >  SIMPLE_ARG
> > If gr>10, go to OTHER. Otherwise, load the
> > argument value into general register gr,
> > set gr to gr+1, can goto SCAN. . . .
> > 
> >  LONG_LONG
> > If gr>9, go to OTHER. Otherwise, . . .
> > 
> > OTHER:
> >Arguments not otherwise handled above are
> >passed in the parameter words of the
> >caller???s stack frame. . . . Set starg to
> >starg+size, then go to SCAN.
> 
> Note that gr is not incremented by LONG_LONG or by the later OTHER usage when 
> gr>9. (That would be my example's 8 byte integer that is later followed by a 
> 4 byte one.)
> 
> That OTHER's "go to SCAN" would then lead to the following 4 byte integer in 
> my example to be put in r10 and gr then being set to 11 instead of it being 
> stored in a parameter word on the stack.
> 
> The nasty thing about this for va_list/va_arg use is that the stored 
> information does not indicate which was before vs. after in the argument 
> order: the 4 byte r10 content or the 8 byte "OTHER" content: the two orders 
> produce identical results.
> 
> This can not be correct.
> 
> The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and explicitly 
> deals with VR and other modern things. (Its terminology matching LONG_LONG 
> above is DUAL_GP.) But for what I'm dealing with here it has the following 
> extra wording at the very end of its OTHER section:
> 
> > If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr = 11 (to 
> > prevent subsequent SINGLE_GPs from being placed in registers after DUAL_GP, 
> > QUAD_GP, or EIGHT_GP arguments that would no longer fit in the registers).
> 
> 
> 
> I've left the prior information below for reference.
> 
> ===
> Mark Millard
> markmi at dsl-only.net
> 
> 
> 
> On 2016-Feb-14, at 2:34 PM, Mark Millard  wrote:
> > 
> > On 2016-Feb-14, at 11:29 AM, Roman Divacky  wrote:
> >> 
> >> Fwiw, the code to handle the vaarg is in 
> >> tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg()
> >> 
> >> You can take a look to see whats wrong.
> >> 
> >> On Sat, Feb 13, 2016 at 07:03:29PM -0800, Mark Millard wrote:
> >>> I've isolated another clang 3.8.0 TARGET_ARCH=powerpc SEGV problem that 
> >>> shows up for using clang 3.8.0 to buildworld/installworld for powerpc.
> >>> 
>  ls -l -n /
> >>> 
> >>> gets a SEGV. As listed in 
> >>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207175 ( and  
> >>> 

Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

2016-02-15 Thread Mark Millard

I'm top posting as the following can stand on its own fairly well.

On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:

> On 02/14/16 14:34, Mark Millard wrote:
> > clang's code base is not familiar material for me nor do I have solid 
> > reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so 
> > the below has my guess work involved. The following code appears to 
> > have hard wired a global, unvarying constant (8) into the test for 
> > picking UsingRegs vs. UsingOverflow.
> 
> For reference, we use the standard ELF ABI 
> (https://uclibc.org/docs/psABI-ppc.pdf).
> -Nathan

Reviewing the Parameter Passing material in that document shows that the 
problem is in the original specification.

And there is a more modern specification that has a fix in its wording. (Which 
shows that I'm not likely to be wrong.) I'll reference and quote it later.

First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft 1995 
document).

First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in r3, 
r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next register to 
be used, not the last one already used.

The document splits the algorithm for placement of parameters into 3 stages 
with the following structure, intended as they have it in the document but 
various less interesting details for my "8byte then 4byte" example omitted:

> INITIALIZING:
>  Set fr=1, gr=3, and starg to the address of
>  parameter word 1.
> SCAN:
>  If there are no more arguments, terminate.
>  Otherwise, select one of the following
>  depending on the type of the next argument:
> 
>  DOUBLE_OR_FLOAT
> If fr>8 ( . . .), go to OTHER. Otherwise,
> . . .
> 
>  SIMPLE_ARG
> If gr>10, go to OTHER. Otherwise, load the
> argument value into general register gr,
> set gr to gr+1, can goto SCAN. . . .
> 
>  LONG_LONG
> If gr>9, go to OTHER. Otherwise, . . .
> 
> OTHER:
>Arguments not otherwise handled above are
>passed in the parameter words of the
>caller’s stack frame. . . . Set starg to
>starg+size, then go to SCAN.

Note that gr is not incremented by LONG_LONG or by the later OTHER usage when 
gr>9. (That would be my example's 8 byte integer that is later followed by a 4 
byte one.)

That OTHER's "go to SCAN" would then lead to the following 4 byte integer in my 
example to be put in r10 and gr then being set to 11 instead of it being stored 
in a parameter word on the stack.

The nasty thing about this for va_list/va_arg use is that the stored 
information does not indicate which was before vs. after in the argument order: 
the 4 byte r10 content or the 8 byte "OTHER" content: the two orders produce 
identical results.

This can not be correct.

The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and explicitly 
deals with VR and other modern things. (Its terminology matching LONG_LONG 
above is DUAL_GP.) But for what I'm dealing with here it has the following 
extra wording at the very end of its OTHER section:

> If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr = 11 (to 
> prevent subsequent SINGLE_GPs from being placed in registers after DUAL_GP, 
> QUAD_GP, or EIGHT_GP arguments that would no longer fit in the registers).



I've left the prior information below for reference.

===
Mark Millard
markmi at dsl-only.net



On 2016-Feb-14, at 2:34 PM, Mark Millard  wrote:
> 
> On 2016-Feb-14, at 11:29 AM, Roman Divacky  wrote:
>> 
>> Fwiw, the code to handle the vaarg is in 
>> tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg()
>> 
>> You can take a look to see whats wrong.
>> 
>> On Sat, Feb 13, 2016 at 07:03:29PM -0800, Mark Millard wrote:
>>> I've isolated another clang 3.8.0 TARGET_ARCH=powerpc SEGV problem that 
>>> shows up for using clang 3.8.0 to buildworld/installworld for powerpc.
>>> 
 ls -l -n /
>>> 
>>> gets a SEGV. As listed in 
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207175 ( and  
>>> https://llvm.org/bugs/show_bug.cgi?id=26605 ) the following simplified 
>>> program also gets the SEGV on powerpc:
>>> 
 #include  // for va_list, va_start, va_arg, va_end
 #include  // for intmax_t
 
 intmax_t
 va_test (char *s, ...)
 {
   va_list vap;
 
   va_start(vap, s);
 
   char*t0 = va_arg(vap, char*);
   unsigned int o0 = va_arg(vap, unsigned int);
   int  c0 = va_arg(vap, int);
   unsigned int u0 = va_arg(vap, unsigned int);
   int  c1 = va_arg(vap, int);
   char *   t1 = va_arg(vap, char*);
 
   intmax_t j0 = va_arg(vap, intmax_t); // This spans into 
 overflow_arg_area.
 
   int  c2 = va_arg(vap, int);  // A copy was put in the 
// overflow_arg_area because of 
 the