PIE/PIC issue ...w.r.t linker variable

2016-02-12 Thread Umesh Kalappa
Hi Guys ,

we  do have a issue with below code ,When we enabled the pie (-fpie/pie)  option
 i.e

main.c
extern int *my_ptr ;

int main()
{
   return *my_ptr;
}

foo.s
 .syntax unified
 .cpu cortex-m0
 .fpu softvfp
  .thumb
  .global my_ptr
   .global my_var
.data
.align  2
   .type   my_ptr, %object
   .size   my_ptr, 4
my_ptr:
  .word   my_var   //where my_var is the linker variable

custom.ld  (linker script)
/* Set stack top to end of RAM, and stack limit move down by
190  * size of stack_dummy section */
191 my_var = 20;
192 __StackTop = ORIGIN(RAM) + LENGTH(RAM);
193 __StackLimit = __StackTop - SIZEOF(.stack_dummy);
194 PROVIDE(__stack = __StackTop);


command used

  3  arm-none-eabi-gcc -c -fPIC main.c -mthumb -mcpu=cortex-m0
  4
  5 arm-none-eabi-gcc -c -fPIC foo.S -mthumb -mcpu=cortex-m0
  6 arm-none-eabi-gcc -c -fPIC
/home/egoumal/Downloads/gcc-arm-none-eabi-5_2-2015q4/share/gcc-arm-none-eabi/samples/startup/startu
   p_ARMCM0.S -mthumb -mcpu=cortex-m0 -D__STARTUP_CLEAR_BSS
-D__START=main
  7
  8 arm-none-eabi-ld -pie main.o  foo.o startup_ARMCM0.o -L.
-L/home/egoumal/Downloads/gcc-arm-none-eabi-5_2-2015q4/share/gcc-ar
m-none-eabi/samples/ldscripts -T nokeep.ld -Map=test.map -o test

we expect my_ptr value to be 20 ,but we do see the value 0 and without
pie option ,the my_ptr has the value 20 .

do we missing something here  or value 0 expected (which is incorrect)


Thank you and appreciate any lights on this
~Umesh


Re: PIE/PIC issue ...w.r.t linker variable

2016-02-12 Thread Umesh Kalappa
Hi Kyrill ,
Thank you for the info ,before i file a bug ,need to confirm its a bug or not .

Thank you
~Umesh

On Fri, Feb 12, 2016 at 3:00 PM, Kyrill Tkachov
 wrote:
> Hi,
>
>
> On 12/02/16 09:19, Umesh Kalappa wrote:
>>
>> Hi Guys ,
>>
>> we  do have a issue with below code ,When we enabled the pie (-fpie/pie)
>> option
>>   i.e
>>
>> main.c
>> extern int *my_ptr ;
>>
>> int main()
>> {
>> return *my_ptr;
>> }
>>
>> foo.s
>>   .syntax unified
>>   .cpu cortex-m0
>>   .fpu softvfp
>>.thumb
>>.global my_ptr
>> .global my_var
>>  .data
>>  .align  2
>> .type   my_ptr, %object
>> .size   my_ptr, 4
>> my_ptr:
>>.word   my_var   //where my_var is the linker variable
>>
>> custom.ld  (linker script)
>> /* Set stack top to end of RAM, and stack limit move down by
>>  190  * size of stack_dummy section */
>>  191 my_var = 20;
>>  192 __StackTop = ORIGIN(RAM) + LENGTH(RAM);
>>  193 __StackLimit = __StackTop - SIZEOF(.stack_dummy);
>>  194 PROVIDE(__stack = __StackTop);
>>
>>
>> command used
>>
>>3  arm-none-eabi-gcc -c -fPIC main.c -mthumb -mcpu=cortex-m0
>>4
>>5 arm-none-eabi-gcc -c -fPIC foo.S -mthumb -mcpu=cortex-m0
>>6 arm-none-eabi-gcc -c -fPIC
>>
>> /home/egoumal/Downloads/gcc-arm-none-eabi-5_2-2015q4/share/gcc-arm-none-eabi/samples/startup/startu
>> p_ARMCM0.S -mthumb -mcpu=cortex-m0 -D__STARTUP_CLEAR_BSS
>> -D__START=main
>>7
>>8 arm-none-eabi-ld -pie main.o  foo.o startup_ARMCM0.o -L.
>> -L/home/egoumal/Downloads/gcc-arm-none-eabi-5_2-2015q4/share/gcc-ar
>>  m-none-eabi/samples/ldscripts -T nokeep.ld -Map=test.map -o test
>>
>> we expect my_ptr value to be 20 ,but we do see the value 0 and without
>> pie option ,the my_ptr has the value 20 .
>>
>> do we missing something here  or value 0 expected (which is incorrect)
>
>
> note that gcc-bugs is the list where the automatic bug tracker sends all the
> emails
> logging almost all activity, so your email would be lost there...
> Please file a bug report in bugzilla according to https://gcc.gnu.org/bugs/
>
> Thanks,
> Kyrill
>
>
>>
>> Thank you and appreciate any lights on this
>> ~Umesh
>>
>


Change the arrch64 abi ...(Custom /Specific change)

2016-04-04 Thread Umesh Kalappa
Hi All,

We are in process of changing the gcc compiler for aarch64 abi ,w.r.t
varargs  function arguments handling.

default(LP64) ,where 1,2,4 bytes  args are promoted to word size i.e 4
bytes ,we need to change these behaviour to 8 bytes (double word).

we are looking both hooks like  PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE to make the changes.

arm has the definition for these macros and we are trying tweak the
code ,any expert suggestions  ,will help us to implement .

The gcc compiler version is 4.7.0 .


Thank you
~Umesh


Re: Change the arrch64 abi ...(Custom /Specific change)

2016-04-05 Thread Umesh Kalappa
Thank you Jim for the input,

I need to ,make the  changes only to the  function args(varargs),hence
making the changes in TARGET_PROMOTE_FUNCTION_MODE will do ?.

one more question ,i do have defined the TARGET_PROMOTE_FUNCTION_MODE
(arm.c) and cross compilling for aarch64 ,but still gcc calls
default_promote_function_node i.e

Breakpoint 2, promote_function_mode (type=0x2acd3540, mode=DImode,
punsignedp=0x7fffd81c, funtype=0x2ab79c78,
for_return=0) at /nobackup/ukalappa/gcc/4.7.0/gcc/gcc/explow.c:781
781   if (type == NULL_TREE)
(gdb) s
779 {
(gdb) s
781   if (type == NULL_TREE)
(gdb) s
791   switch (TREE_CODE (type))
(gdb) s
796   return targetm.calls.promote_function_mode (type, mode,
punsignedp, funtype,
(gdb) s
default_promote_function_mode (type=0x2acd3540, mode=DImode,
punsignedp=0x7fffd81c, funtype=0x2ab79c78,
for_return=0) at /nobackup/ukalappa/gcc/4.7.0/gcc/gcc/targhooks.c:127
127   if (type != NULL_TREE && for_return == 2)


I'm sure ,we are missing something here ,looking for it ,any inputs
will be help us lot .

Thank you
~Umesh

On Tue, Apr 5, 2016 at 5:01 AM, Jim Wilson  wrote:
> On 04/04/2016 08:55 AM, Umesh Kalappa wrote:
>>
>> We are in process of changing the gcc compiler for aarch64 abi ,w.r.t
>> varargs  function arguments handling.
>>
>> default(LP64) ,where 1,2,4 bytes  args are promoted to word size i.e 4
>> bytes ,we need to change these behaviour to 8 bytes (double word).
>>
>> we are looking both hooks like  PROMOTE_MODE and
>> TARGET_PROMOTE_FUNCTION_MODE to make the changes.
>
>
> I think this would work.  You just need to promote all modes less than 8
> bytes to DImode, instead of the current code that promotes modes smaller
> than 4 bytes to SImode.  You would do this for the default LP64 type system,
> but not for the ILP32 type system.
>
> This would affect all function arguments and locals, which may cause code
> size and/or performance issues.  You would have to check for that.
>
> Also, this may prevent linking with any 3rd party code compiled by
> unmodified gcc, or code compiled with other compilers (e.g. LLVM), because
> changing TARGET_PROMOTE_FUNCTION_MODE can cause ABI changes. You may need to
> check that also.
>
> Jim
>


ARM gold unknown option.

2016-05-16 Thread Umesh Kalappa
Hi All ,

We are migrating to the gold linker and see the below issue


bash-4.1$ /auto/compiler-migration/bin/armeb-linux-gnueabi-ld.gold --be8

/auto/compiler-migration/bin/armeb-linux-gnueabi-ld.gold: --be8: unknown option


Any help ,will be appreciated .

Thank you
~Umesh


Re: ARM gold unknown option.

2016-05-17 Thread Umesh Kalappa
On top of that ,

How do i enable the Byte Invariant Addressing mode for gold ??

Thank you
~Umesh

On Tue, May 17, 2016 at 11:04 AM, Umesh Kalappa
 wrote:
> Hi All ,
>
> We are migrating to the gold linker and see the below issue
>
>
> bash-4.1$ /auto/compiler-migration/bin/armeb-linux-gnueabi-ld.gold --be8
>
> /auto/compiler-migration/bin/armeb-linux-gnueabi-ld.gold: --be8: unknown 
> option
>
>
> Any help ,will be appreciated .
>
> Thank you
> ~Umesh


Replacement for the .stabs directive

2016-08-19 Thread Umesh Kalappa
Hello Everyone ,

We have the legacy code  ,that uses the .stabs directive quiet often
in the source code like

.stabs "symbol_name", 100, 0, 0, 0 + .label_one f;

.label_one
 stmt


and ,the above code is wrapped with the  inline asm in the c source file .

we are using clang 3.8(with lto) and as you know  that  builtin
assembler / MC streamer  don't have support  fir .stabs  directive.

we are looking to emulate the  above .stabs semantic  i.e associative
the  label_one address with "symbol_name" ,by dwarf or any gas
directives .

Any suggestions,how we can achieve that  ?


Thank you
~Umesh


Stack offset computation for incoming arguments.

2014-04-25 Thread Umesh Kalappa
Hi All,

Our private backend has the macro  defined as

#define FIRST_PARM_OFFSET(FNDECL)  (get_frame_size() +
STARTING_FRAME_OFFSET  + RETURN_BYTES )

#define STARTING_FRAME_OFFSET  1

#define STACK_POINTER_REGNUM10

#define FRAME_POINTER_REGNUM STACK_POINTER_REGNUM

#define ARG_POINTER_REGNUM STACK_POINTER_REGNUM

#define  RETURN_BYTES  2

The  pass i.e instantiate_virtual_regs  set the  in_arg_offset as

in_arg_offset = FIRST_PARM_OFFSET (current_function_decl);

the computed  frame_size is X and in_arg_offset is X + 1 + 2  i.e X + 3.

And calls the instantiate_virtual_regs_in_rtx function  ,which emits
the insns to access the incoming argumnets (that are passed with
stack) via stack_pointer_rtx  + in_arg_offset (stack grows downward).

But in the reload pass the the alter_reg function (spill the  reg to
stack ) expand the stack frame and update the frame_size i.e  X = X +
allocated space.

For each spill the frame_size is updated  and at the end of the reload
pass the frame_size=frame_size+ total_allocated_spilled _space;

Our prologue  code looks like

void expand_prologue()
{

HOST_WIDE_INT frame_size = get_frame_size ();  //here
frame_size=frame_size+ total_allocated_spilled _space;
rtx reg,insn ;

if(frame_size > 0 )
{
insn =
emit_move_insn(stack_pointer_rtx,gen_rtx_MINUS(GET_MODE(stack_pointer_rtx),stack_pointer_rtx,(GEN_INT(frame_size))
));
RTX_FRAME_RELATED_P (insn) = 1;
}
}


The problem is that the frame_size is not same ,when there is the
spill code between  instantiate_virtual_regs_in_rtx()  and
expand_prologue() ,hence incoming arg offset goes for a  toss like

sub sp ,10// prologue where frame_size =10

ld  R1,sp   // accessing the first argument that is passed
in the stack;
add R1, 11   // the offset should be 13  i.e frame_size + 1 +
2; but it is 11 (stack_pointer_rtx  + (in_arg_offset =11)  where the
frame_size is 8  for locals )
ld  R2, [R1]

ld[sp+9] ,R10  // spill code

add sp,10 // epilogue

In the above asm  the  incoming arguments fetch  going for a  toss
,Any idea what  going wrong with the computation offset for incoming
arguments  here ??

Thank you
~Umesh


Variadic functions arguments passing

2014-04-25 Thread Umesh Kalappa
Hi All,

In our private port ,we define function_arg hook  to pass the first
three args in the reg and rest will go to stack.

But for variadic  functions the arguments  need  to  pass  through the stack.

How we can achieve this ?? Any inputs will be appreciate.

Thank you
~Umesh


Re: Stack offset computation for incoming arguments.

2014-04-25 Thread Umesh Kalappa
Thank you Eric for the inputs and will make the required changes.

Thank you Again
~Umesh

On Fri, Apr 25, 2014 at 9:58 PM, Eric Botcazou  wrote:
>> #define FIRST_PARM_OFFSET(FNDECL)  (get_frame_size() +
>> STARTING_FRAME_OFFSET  + RETURN_BYTES )
>
> I don't think that you can define FIRST_PARM_OFFSET like so, you need to have
> a fixed FIRST_PARM_OFFSET (for some definition of fixed) and eliminate the
> argument pointer during reload.
>
> --
> Eric Botcazou


Change the calling conventions only for the intrinsic functions.

2014-05-07 Thread Umesh Kalappa
Hi All ,



We are porting GCC  4.8.1 for the customized hardware, where the
current calling convention used as arguments are passed  by stack and
return value by register.

But we do have some intrinsic functions(that are supplied by hardware
folks ) which has the calling convention like both arguments and
return value are passed by stack.

So question is that currently the backend using  usually  conventions
like arguments are  passed by stack ad return value by register ,But
how do we can model that only intrinsic uses the caller stack space
for return value  over registers.


Appreciate  any hints or shed some lights  on the same.



Thank you

~Umesh


Blcopy for outgoing arguments.

2014-05-09 Thread Umesh Kalappa
Hi All,
Good day there.

We are porting the gcc 4.8.1 and defined the  calling conventions like
If(sizeof(arg) < = 2)
Pass with reg.
Else
Pass by stack.

The problem is that  the code is bloated for the datatype where
sizeof(datatype) > 2 .i.e

We see  the sequences of load from  the stack (for local)  and store
to stack(outgoiing args) like

ld reg,(fp-d-1)
ld (sp+d),reg
ld reg,(fp-d-2)
ld (sp+d+1),reg
ld reg,(fp-d-3)
ld (sp+d+2),reg
ld reg,(fp-d-4)
ld (sp+d+3),reg
…….
and goes on for large type size …..

So we thought of having blkcopy intrinsic function which takes the
address of src and dst  along size and the  alignment info.

Se we defined the  some of the gcc macros like

#defined MOVE_MAX  2
#define MOVE_BY_PIECES_P(SIZE, ALIGN) \
  ( (SIZE) == 1 || (SIZE) == 2  )

And define movmemhi pattern( word size is 2 byte) as

;; Argument 0 is the destination
;; Argument 1 is the source
;; Argument 2 is the length
;; Argument 3 is the alignment

(define_expand "movmemhi"
  [(match_operand 0 "general_operand" "")
   (match_operand 1 "general_operand" "")
   (match_operand 2 "register_operand" "")
   (match_operand 3 "" "")
   ]
  ""
{
rtx libfunc = init_one_libfunc ("blkcopy ");
emit_library_call_value (libfunc,NULL_RTX, LCT_NORMAL, VOIDmode,
   2, operands[0],
HImode,operands[1], HImode);
DONE;
}

  )

After all this ,we still see the sequences of load from  the stack
(for local)  and store to stack(outgoiing args) and backend is not
able to emit the libcall for the blockcopy.

We are debugging the issue and the  meanwhile appreciate any help from
the group .


Thank you and waiting for a any reply
~Umesh


error: unrecognizable insn:

2014-05-17 Thread Umesh Kalappa
Dear All,
We are porting the gcc 4.8.1 for our private target .
The compiler pops up with below error .

error: unrecognizable insn:

(insn 22 21 32 6 (set (reg:HI 30 [ D.1532 ])
(plus:HI (symbol_ref:HI ("stringArray")  )
(const_int -1 [0x]))) -1
 (nil))
altcon_014.c:130:1: internal compiler error: in extract_insn, at recog.c:2150
0x836b497 _fatal_insn(char const*, rtx_def const*, char const*, int,
char const*)
../../../src/gcc-4.8.1/gcc/rtl-error.c:109
0x836b4e1 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../../src/gcc-4.8.1/gcc/rtl-error.c:117
0x8345974 extract_insn(rtx_def*)
../../../src/gcc-4.8.1/gcc/recog.c:2150
0x825072e instantiate_virtual_regs_in_insn
../../../src/gcc-4.8.1/gcc/function.c:1561
0x825072e instantiate_virtual_regs
../../../src/gcc-4.8.1/gcc/function.c:1928

The below is the movhi template defined  in the md file  for  a
HImode operation

define_insn "*movhi"
  [(set (match_operand:HI 0 "general_mov_lhs_operand" "=r,=r,=r,=r,R,A,=b,=r")
(match_operand:HI 1 "general_mov_operand" "r,R ,A ,I ,r,r,S,S"))]

and respective  predicates are defined  as

(define_predicate "general_mov_lhs_operand"
  (match_code "mem,reg,subreg,const")
{
  /* Any (MEM LABEL_REF) is OK.  That is a pc-relative load.  */
  if (GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == LABEL_REF)
return 1;
  /* (MEM SYMBOL_REF) */
  else if(GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == SYMBOL_REF)
return 1;
  else if(GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == CONST_INT)
return 1;
  return nonimmediate_operand(op,mode);
})

(define_predicate "general_mov_operand"
  (match_code 
"mem,reg,subreg,symbol_ref,label_ref,const,plus,const_int,const_double")
{
  /* Any (MEM LABEL_REF) is OK.  That is a pc-relative load.  */
  if (GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == LABEL_REF)
return 1;
  else if(GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == CONST_INT)
return 1;
  /* (MEM SYMBOL_REF) */
  else if(GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == SYMBOL_REF)
return 1;
  return general_operand (op, mode);
})

If we modified the general_mov_operand predicate as

(define_predicate "general_mov_operand"
  (match_code 
"mem,reg,subreg,symbol_ref,label_ref,const,plus,const_int,const_double")
{
  /* Any (MEM LABEL_REF) is OK.  That is a pc-relative load.  */
  if (GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == LABEL_REF)
return 1;
  else if(GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == CONST_INT)
return 1;
  /* (MEM SYMBOL_REF) */
  else if(GET_CODE (op) == MEM && GET_CODE (XEXP (op, 0)) == SYMBOL_REF)
return 1;
  else if(GET_CODE (op) == PLUS  && GET_CODE (XEXP (op, 0)) == SYMBOL_REF)
return 1;

  return general_operand (op, mode);
})


The compilation goes smooth ,but I’m not sure why  the general_operand
couldn’t match the error insn.

Any pointer will be appreciate and did our changes to the  predicate
file make  any sense ?

Thank you in advance
~Umesh


ELIMINABLE_REGS and INITIAL_ELIMINATION_OFFSET effectiveness.

2014-05-26 Thread Umesh Kalappa
Dear All,

We are porting 4.8.1 to the one of out private backend and defined the
macros like

#define ELIMINABLE_REGS \
 {{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM},  \
  { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}}\

#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
 (OFFSET) = tlcs_initial_elimination_offset (FROM, TO)


The above macros were effective when the optimization was enabled(-On) i.e

static void
ira (FILE *f)
{
  bool loops_p;
  int ira_max_point_before_emit;
  int rebuild_p;
  bool saved_flag_caller_saves = flag_caller_saves;
  enum ira_region saved_flag_ira_region = flag_ira_region;

  ira_conflicts_p = optimize > 0; //optimize is zero here for defualt
and -O0 options


}

static void
do_reload (void)

{
{
  df_set_flags (DF_NO_INSN_RESCAN);
  build_insn_chain ();

  need_dce = reload (get_insns (), ira_conflicts_p);
//ira_conflicts_p is zero

}
}
/*
   GLOBAL nonzero means we were called from global_alloc
   and should attempt to reallocate any pseudoregs that we
   displace from hard regs we will use for reloads.
   If GLOBAL is zero, we do not have enough information to do that,
   so any pseudo reg that is spilled must go to the stack.
*/

bool
reload (rtx first, int global) //global is zero
{


  /* If global-alloc was run, notify it of any register eliminations we have
 done.  */
  if (global)
for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
  if (ep->can_eliminate)
mark_elimination (ep->from, ep->to);
}



Did the ELIMINABLE_REGS and INITIAL_ELIMINATION_OFFSET macros are only
effective when optimization enabled or do we missing something
here

Thank you in advance
~Umesh


Re: Stack offset computation for incoming arguments.

2014-05-29 Thread Umesh Kalappa
Dear Eric,

As advised given by you ,we defind the following marcos like

#define ARG_POINTER_REGNUM 9
#define FRAME_POINTER_REGNUM 8
#define STACK_POINTER_REGNUM 10


#define ELIMINABLE_REGS \
 {{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM},  \
  { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}}\

#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
 (OFFSET) = tlcs_initial_elimination_offset (FROM, TO)

#define CAN_ELIMINATE 1

#define FIRST_PARM_OFFSET 3

With -O0 or by default options the frame and arg regs  are not
replaced by stack reg and the  same  replaced  with -O1 and above
optimisation.

Did the macros like ELIMINABLE_REGS ,INITIAL_ELIMINATION_OFFSET and
CAN_ELIMINATE  are effective only when optimisation enabled  ?? or do
we missing something here ??

Thank you and appreciate any reply from the group.

~Umesh

On Fri, Apr 25, 2014 at 9:58 PM, Eric Botcazou  wrote:
>> #define FIRST_PARM_OFFSET(FNDECL)  (get_frame_size() +
>> STARTING_FRAME_OFFSET  + RETURN_BYTES )
>
> I don't think that you can define FIRST_PARM_OFFSET like so, you need to have
> a fixed FIRST_PARM_OFFSET (for some definition of fixed) and eliminate the
> argument pointer during reload.
>
> --
> Eric Botcazou


Re: Stack offset computation for incoming arguments.

2014-05-29 Thread Umesh Kalappa
Dear Eric,

As advised given by you ,we defind the following marcos like

#define ARG_POINTER_REGNUM 9
#define FRAME_POINTER_REGNUM 8
#define STACK_POINTER_REGNUM 10


#define ELIMINABLE_REGS \
 {{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM},  \
  { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}}\

#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
 (OFFSET) = tlcs_initial_elimination_offset (FROM, TO)

#define CAN_ELIMINATE 1

#define FIRST_PARM_OFFSET 3

With -O0 or by default options the frame and arg regs  are not
replaced by stack reg and the  same  replaced  with -O1 and above
optimisation options.

Did the macros like ELIMINABLE_REGS ,INITIAL_ELIMINATION_OFFSET and
CAN_ELIMINATE  are only  effective  when optimisation enabled  ?? or do
we missing something here ??

Thank you and appreciate any reply from the group.

On Fri, Apr 25, 2014 at 9:58 PM, Eric Botcazou  wrote:
>> #define FIRST_PARM_OFFSET(FNDECL)  (get_frame_size() +
>> STARTING_FRAME_OFFSET  + RETURN_BYTES )
>
> I don't think that you can define FIRST_PARM_OFFSET like so, you need to have
> a fixed FIRST_PARM_OFFSET (for some definition of fixed) and eliminate the
> argument pointer during reload.
>
> --
> Eric Botcazou


Re: Stack offset computation for incoming arguments.

2014-05-30 Thread Umesh Kalappa
Dear Eric,

Really Appreciate your reply here and made the following changes like

#define  ARG_POINTER_REGNUM 8   //Fake hard reg
#define FRAME _POINTER_REGNUM 9  // Fake hard reg
#define SP_REG  10

#define ELIMINABLE_REGS { {ARG_POINTER_REGNUM,STACK_POINTER_REGNUM},\
  {ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM},   \
  {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM} }


 The ARG and FRAME  reg are not marked as fixed regs ,but marked as
call used regs respectively.


The  reload pass is eliminating the arg and fp regs to sp  ,when the
sample is  run with -fomit-frame-pointer  option, i.e

$t-gcc -S -fomit-frame-pointer  sample.c

But without the –fomit-frame-pointer  option the arg and fp is not
replaced with sp.

Please help us regrading with   any hints ??

Thank you
~Umesh

On Fri, May 30, 2014 at 4:24 PM, Eric Botcazou  wrote:
>> ARG_POINTER_REGNUM and FRAME_POINTER_REGNUM need to be pseudo-registers if
>> they do not represent real registers.
>
> The wording "pseudo registers" is obviously a bit confusing in this context...
>
> If ARG_POINTER_REGNUM and FRAME_POINTER_REGNUM do not represent real registers
> then they need to be fake hard registers, i.e. hard registers according to the
> FIRST_PSEUDO_REGISTER macro but with an arbitrary REGNUM (typically just below
> the FIRST_PSEUDO_REGISTER macro).  See the numerous examples in the tree.
>
> --
> Eric Botcazou


Attributes for var_decl and fun_decl.

2014-06-12 Thread Umesh Kalappa
Dear All,

We ported gcc 4.8.1 for custom hardware and we have target specific
attributes like io for variables and interrupt for functuions and many
more.

We are able to fetch the attributes for variables like

look_up(DECL_ATTRIBUTES(node),attr_name)

for typedef variables we are fetching  attributes like


look_up(TYPE_ATTRIBUTES(node),attr_name)

which is not working as expected and group any hints on the same ??

Second,We are able to fetch the attributes for functions  like
look_up(DECL_ATTRIBUTES(fndecl),attr_name) or
look_up(TYPE_ATTRIBUTES(fntype),attr_name)

for function pointer to the above attributed functions is not working
as expected like
int __attribute((cdecl)) test(void);
int (*ptr) (void);
ptr = &test;

We are looking on the exist ports ,meanwhile any lights from experts
on the same is appreciated .

Thank you
~Umesh


About Code coverage Algorithms.

2014-10-13 Thread Umesh Kalappa
Hi All,

Good day for everyone .

We benchmarked  the code coverage algorithms  like

a)Optimal Edge Profiling
(ftp://ftp.cs.wisc.edu/pub/techreports/1991/TR1031.pdf .) that are
adopted by  GCC and LLVM
b)Dominator Leaf
instrumentation(http://users.sdsc.edu/~mtikir/publications/papers/issta02.pdf)
 and i don't see the practical implementation of this.

Goal is to reduce the instrumentation point  and when we benchmarked
both algorithms .The number of blocks that are  required to instrument
is less in option-b over option-a.

Need some expert insights on this like why the option-b is not widely
used over option-a.


Thank you
~Umesh


error: ‘ggc_alloc_cleared_machine_function’ was not declared in this scope

2014-11-20 Thread Umesh Kalappa
Hi All ,

Tried with gcc 4.8.3 build for mips as

$../src/gcc-4.8-2014.05/configure --target=mips   --enable-languages=c,c++

$make all-gcc

g++ -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings
-Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H -I. -I.
-I../../src/gcc-4.8-2014.05/gcc -I../../src/gcc-4.8-2014.05/gcc/.
-I../../src/gcc-4.8-2014.05/gcc/../include
-I../../src/gcc-4.8-2014.05/gcc/../libcpp/include
-I../../src/gcc-4.8-2014.05/gcc/../libdecnumber
-I../../src/gcc-4.8-2014.05/gcc/../libdecnumber/dpd -I../libdecnumber
-I../../src/gcc-4.8-2014.05/gcc/../libbacktrace \
../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c -o mips.o
../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c: In function ‘bool
mips_cfun_use_frame_header_p()’:
../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c:10058:27: warning:
variable ‘frame’ set but not used [-Wunused-but-set-variable]
../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c: In function
‘machine_function* mips_init_machine_status()’:
../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c:16912:46: error:
‘ggc_alloc_cleared_machine_function’ was not declared in this scope
In file included from ../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c:19297:0:
../../src/gcc-4.8-2014.05/gcc/config/mips/mips.c:16913:1: warning:
control reaches end of non-void function [-Wreturn-type]
make[1]: *** [mips.o] Error 1
make[1]: Leaving directory
`/home/i16382/work/gcc_upgrade_4.8.3/4.8.3/microchip/build/gcc'
make: *** [all-gcc] Error 2

Googled for the same ,but no help .

Please can anyone help me here and above issue is blocking  us :(.

Thank you in advance
~Umesh


Re: Optimized Allocation of Argument registers

2014-11-24 Thread Umesh Kalappa
Ajit,

Please check it out  the -fshrink-wrap option.


~Umesh

On Mon, Nov 24, 2014 at 5:17 PM, Ajit Kumar Agarwal
 wrote:
> All:
>
> The optimization of reducing save and restore of the callee and caller saved 
> register has been the attention Of
> increasing the performance of the benchmark. The callee saved registers is 
> saved at the entry and restore at the
> exit of the procedure if the register is reused inside the procedure whereas 
> the caller save registers at the Caller site
> is saved before the call and the restore after the variable is live and spans 
> through the call.
>
> The GCC port has done some optimization whereas the call-used registers are 
> live inside the procedure and has been
> set as 1 bit then it will not be saved and restored. This is based on the 
> data flow analysis.
>
> The callee saved registers is useful when there all multiple calls in the 
> call graph whereas the caller save registers are
> useful if the call is the leaf procedure then the saving before the call and 
> restore after the call will be useful  and increases
>  the performance.
>
> By traversing the call graph in depth-first-order and the bottom-up approach 
> we can propagate the save and restore
> At the procedure entry and exit to the upper regions of the call graph which 
> reduces the save and restore at all the lower
> Regions across the various lower calls. These decision can be made based on 
> the frequency of the call in the call graph as
> Proposed by Fred Chow.
>
> Another approach to reducing the save and restore at the procedure entry and 
> exit is moving the save and restore from
> The procedure entry to the active regions where the variable is Live inside 
> the procedure based on the data flow analysis
> And thus improve the performance of many benchmarks as proposed by Fred Chow.
>
> The propagation of save and restore from the lower regions of the call graph 
> to the upper regions is not implemented in
> The GCC framework and also the moving the save and restore to the active 
> region of Liveness inside the procedure from
> The entry and exit is not implemented inside the GCC framework. Can this be 
> proposed and implemented in GCC framework?
>
> For the open procedure whereas the indirect calls, recursive calls and the 
> external linkage calls cannot be optimized and in
> The open case the save and restore at the entry and exit of the procedure is 
> applied. But for the open procedure if all the
> Lower calls in the call-graph is closed and resolved through call-graph, the 
> save and restore can be propagate to the upper
> Region in the open procedures from the lower region of the calls which are 
> closed and resolved. This can also improve the
> Performance of many benchmarks and can this be proposed and implemented in 
> GCC framework?
>
> Let me know what do you think.
>
> Thanks & Regards
> Ajit
>
> -Original Message-
> From: Ajit Kumar Agarwal
> Sent: Tuesday, November 18, 2014 7:01 PM
> To: 'Vladimir Makarov'; gcc Mailing List
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: RE: Optimized Allocation of Argument registers
>
>
>
> -Original Message-
> From: Vladimir Makarov [mailto:vmaka...@redhat.com]
> Sent: Tuesday, November 18, 2014 1:57 AM
> To: Ajit Kumar Agarwal; gcc Mailing List
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: Optimized Allocation of Argument registers
>
> On 2014-11-17 8:13 AM, Ajit Kumar Agarwal wrote:
>> Hello All:
>>
>> I was looking at the optimized usage and allocation to argument registers. 
>> There are two aspects to it as follows.
>>
>> 1. We need to specify the argument registers as followed by ABI in the 
>> target specific code. Based on the function
>>   argument registers defined in the target dependent code the function 
>> argument registers are passed. If the
>>   number of argument registers defined in the Architecture is large say 6/8 
>> function argument registers.
>> Most of the time in the benchmarks we don't pass so many arguments and the 
>> number of arguments passed
>>   is quite less. Since we reserve the function arguments as specified
>> in the target specific code for the given architecture, leads to unoptimized 
>> usage as this function argument registers will not be used in the function.
>> Thus we need to steal some of the arguments registers and have the
>> usage of those in the function depending on the support of the number of 
>> function argument registers. The stealing of function argument registers will
>>   lead more number of registers available that are to be used in the 
>> function and leading to less spill and fetch.
>>
>
>>>The argument registers should be not reserved.  They should be present in 
>>>RTL and RA allocator will figure out itself when it can use them.
>>>That is how other ports work.
>
> Thanks Vladimir for Clarifications.
>
>> 2. The other aspect of the function argument register

forcing to emit absolute addresses in the .debug_loc setion

2015-01-29 Thread Umesh Kalappa
Hi Guys,

Myself was very new  to dwarf debugging format and recently we migrate
GCC compiler  to 4.8.3 toolchain from 4.5.2 ans using same binutils
2.23.51.

we are seeing the  weird issue with .debug_loc entries and assembler
pop up with below error


/tmp/ccUj1tbg.s: Assembler messages:
/tmp/ccUj1tbg.s:778: Error: can't resolve `.LVL0' {.text section} -
`.text._ZN10__cxxabiv117__array_type_infoD2Ev' {*UND* section}
/tmp/ccUj1tbg.s:779: Error: can't resolve `.LVL1' {.text section} -
`.text._ZN10__cxxabiv117__array_type_infoD2Ev' {*UND* section}
/tmp/ccUj1tbg.s:782: Error: can't resolve `.LVL1' {.text section} -
`.text._ZN10__cxxabiv117__array_type_infoD2Ev' {*UND* section}

corresponding .debug_loc entries

.section.debug_loc,info
.Ldebug_loc0:
.LLST0:
.4byte  .LVL0-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.4byte  .LVL1-1-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.2byte  0x1
.byte   0x54
.4byte  .LVL1-1-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.4byte  .LFE72-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.2byte  0x4
.byte   0xf3
.uleb128 0x1
.byte   0x54
.byte   0x9f
.4byte  0
.4byte  0


googling on the above issue was with  no luck :( ,after going through
the dwarf format it was found that ,the above .debug_loc entries are
relatively not absloute,please correct me here if my assumption was
wrong and we need to stick to dwarf-2 format not to  like 3,4, or 5 .

second,was tweaked/forced the compiler to generate abs address like

static void
output_loc_list (dw_loc_list_ref list_head)
{

  else if (/*!have_multiple_function_sections*/0) //our hacked thing
and weird too
{
  dw2_asm_output_delta (DWARF2_ADDR_SIZE, curr->begin, curr->section,
"Location list begin address (%s)",
list_head->ll_symbol);
  dw2_asm_output_delta (DWARF2_ADDR_SIZE, curr->end, curr->section,
"Location list end address (%s)",
list_head->ll_symbol);
}
  else
{
  dw2_asm_output_addr (DWARF2_ADDR_SIZE, curr->begin,
   "Location list begin address (%s)",
   list_head->ll_symbol);
  dw2_asm_output_addr (DWARF2_ADDR_SIZE, curr->end,
   "Location list end address (%s)",
   list_head->ll_symbol);
}

}

now the .debug_loc section looks like

.section.debug_loc,info
.Ldebug_loc0:
.LLST0:
.4byte  .LVL0
.4byte  .LVL1-1
.2byte  0x1
.byte   0x54
.4byte  .LVL1-1
.4byte  .LFE72
.2byte  0x4
.byte   0xf3
.uleb128 0x1
.byte   0x54
.byte   0x9f
.4byte  0
.4byte  0

now everything goes well,But we are looking the cause and proper fix too.


So please guys, pass us your insights / suggestion / comments  on this.

Thank you
~Umesh


libgcc_s_sjlj-1 and libstdc++-6 dependency...

2015-02-04 Thread Umesh Kalappa
Hi All,

Was configured and build gcc 4.8.3 for windows on linux using mingw .

configure options as

../../src45x/gcc/configure --build=i686-pc-linux-gnu --host=i686-w64-mingw32
--with-dwarf2 --with-newlib --with-gnu-as --with-gnu-ld
--enable-cxx-flags=-mno-smart-io --enable-lto --enable-fixed-point
--enable-gofast --enable-languages=c,c++ --enable-sgxx-sde-multilibs
--enable-sjlj-exceptions --enable-obsolete --disable-hosted-libstdcxx
--disable-libstdcxx-pch --disable-libssp --disable-libmudflap
--disable-libffi --disable-libfortran --disable-bootstrap
--disable-shared --disable-__cxa_atexit --disable-nls
--disable-libgomp --disable-threads --disable-sim
--disable-decimal-float --disable-libquadmath --without-headers
XGCC_FLAGS_FOR_TARGET=-frtti -fexceptions -fno-enforce-eh-specs

When we try to run the binary on windows  it says that


libstdc++-6.dll   and libgcc_s_sjlj-1  not found


currently we copied the those dll's  to the binary folder and things
works fine .

But we need to get rid of those dll dependency,So any  inputs/comments
on this issue will highly appreciated ?

Thank you
~Umesh


string constant of the constant pool entry..

2015-03-02 Thread Umesh Kalappa
Hi All,

I'm trying to fetch the string constant from the constant pool entry
for the symbol_ref rtx like

c sample

int i;
int main()
{
  printf("%d",i);
}

rtl is

(gdb) p debug_rtx(val)
(0xb7da4da0) (symbol_ref/f:SI ("*.LC0") [flags 0x2] )

corresponding asm

   .section.rodata,code
.align  2
.LC0:
.ascii  "%d\000"


sample code to fetch the string "%i"

tree sym = SYMBOL_REF_DECL(rtx);

if (!(sym && (TREE_CODE(sym)==STRING_CST) && STRING_CST_CHECK(sym)))
sym = 0;

const char *string = TREE_STRING_POINTER(sym);

the above sample code fails with returning null in string.

Whats wrong with the above code  ? or How do we fetch the
string_constant from the given symbol_ref ?

Any hints will be appreciated ,thank you

FYI,the  gcc code base is 4.8.3.

~Umesh


string constant of the constant pool entry..

2015-03-02 Thread Umesh Kalappa
Hi All,

I'm trying to fetch the string constant from the constant pool entry
for the symbol_ref rtx like

c sample

int i;
int main()
{
  printf("%d",i);
}

rtl is

(gdb) p debug_rtx(val)
(symbol_ref/f:SI ("*.LC0") [flags 0x2] )

corresponding asm

   .section.rodata,code
.align  2
.LC0:
.ascii  "%d\000"


sample code to fetch the string "%d"

tree sym = SYMBOL_REF_DECL(rtx);

if (!(sym && (TREE_CODE(sym)==STRING_CST) && STRING_CST_CHECK(sym)))
sym = 0;

const char *string = TREE_STRING_POINTER(sym);

the above sample code fails with returning null in string.

Whats wrong with the above code  ? or How do we fetch the
string_constant from the given symbol_ref ?

Any hints will be appreciated ,thank you

FYI,the  gcc code base is 4.8.3.

~Umesh


Re: string constant of the constant pool entry..

2015-03-04 Thread Umesh Kalappa
Thank you richard for the inputs .

~Umesh

On Wed, Mar 4, 2015 at 3:29 AM, Richard Sandiford
 wrote:
> Umesh Kalappa  writes:
>> Hi All,
>>
>> I'm trying to fetch the string constant from the constant pool entry
>> for the symbol_ref rtx like
>>
>> c sample
>>
>> int i;
>> int main()
>> {
>>   printf("%d",i);
>> }
>>
>> rtl is
>>
>> (gdb) p debug_rtx(val)
>> (symbol_ref/f:SI ("*.LC0") [flags 0x2] )
>
> The SYMBOL_REF_DECL is a VAR_DECL whose DECL_INITIAL is the constant.  So:
>
>> corresponding asm
>>
>>.section.rodata,code
>> .align  2
>> .LC0:
>> .ascii  "%d\000"
>>
>>
>> sample code to fetch the string "%d"
>>
>> tree sym = SYMBOL_REF_DECL(rtx);
>>
>> if (!(sym && (TREE_CODE(sym)==STRING_CST) && STRING_CST_CHECK(sym)))
>> sym = 0;
>
> ...I think you want:
>
>   if (TREE_CONSTANT_POOL_ADDRESS_P (symbol))
> {
>   tree str = DECL_INITIAL (SYMBOL_REF_DECL (symbol));
>   if (TREE_CODE (str) == STRING_CST)
> ...
> }
>
> (STRING_CST_CHECK is really local to the tree.h macros, it shouldn't
> be used elsewhere.)
>
> Thanks,
> Richard


Unnamed Struct / Union

2015-03-23 Thread Umesh Kalappa
Hi All ,

GCC 4.8.3 ,pop up with below error

/home/i16382/an.c:13:18: error: duplicate member 'bOriginator'
 unsigned bOriginator;
  ^

for the case

union
{
struct
{
unsigned bStatusType;
unsigned bOriginator;
};
struct
{
unsigned originator;
unsigned memoryContentsChanged;
unsigned interruptPending;
unsigned bOriginator;
};
} USB_WORD;

is that expected behaviour ?

Thank you
~Umesh


ldm/stm bus error

2015-05-18 Thread Umesh Kalappa
Hi All,

Getting a bus/hard error for the below case ,make sense since ldm/stm
expects the address to be word aligned .

bash-4.1$ cat test.c
struct test
{
char c;
int i;
} __attribute__((packed));

struct test a,b;

int main()
{
a =b ; //here compiler is not sure that a or b is word aligned
return a.i;
}

bash-4.1$ arm-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-eabi-gcc
COLLECT_LTO_WRAPPER=/nobackup/ukalappa/build/gcc/mv-ga/c4.7.0-p1/x86_64-linux/libexec/gcc/arm-eabi/4.7.0/lto-wrapper
Target: arm-eabi
Configured with: /nobackup/ukalappa/src/gcc/mv-ga/gcc/configure
--srcdir=/nobackup/ukalappa/src/gcc/mv-ga/gcc --build=x86_64-linux
--target=arm-eabi --host=x86_64-linux
--prefix=/nobackup/ukalappa/build/gcc/mv-ga/c4.7.0-p1
--exec-prefix=/nobackup/ukalappa/build/gcc/mv-ga/c4.7.0-p1/x86_64-linux
--with-pkgversion='Cisco GCC c4.7.0-p1' --with-cisco-patch-level=1
--with-cisco-patch-level-minor=0
--with-bugurl=http://wwwin.cisco.com/it/services/
--disable-maintainer-mode --enable-languages=c,c++ --disable-nls
Thread model: single
gcc version 4.7.0

bash-4.1$ ./arm-eabi-gcc -march=armv7 -mthumb  -S test.c

bash-4.1$ cat test.s
.syntax unified
.arch armv7
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.thumb
.file   "test.c"
.comm   a,5,4
.comm   b,5,4
.text
.align  2
.global main
.thumb
.thumb_func
.type   main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push{r7}
add r7, sp, #0
movwr3, #:lower16:a
movtr3, #:upper16:a
movwr2, #:lower16:b
movtr2, #:upper16:b
ldmia   r2, {r0, r1}   //Bus error
str r0, [r3]
addsr3, r3, #4
strbr1, [r3]
movwr3, #:lower16:a
movtr3, #:upper16:a
ldr r3, [r3, #1]@ unaligned
mov r0, r3
mov sp, r7
pop {r7}
bx  lr
.size   main, .-main


Arm states that ldm/stm should be word aligned and generating ldm/stm
in the above case is the compiler error/bug ,do you guys agree with me
or i'm missing something here ?


Thank you
~Umesh


Re: ldm/stm bus error

2015-05-18 Thread Umesh Kalappa
Thank you all for the reply and appreciate elaborate summary .

~Umesh

On Mon, May 18, 2015 at 10:08 PM, Richard Earnshaw
 wrote:
> On 18/05/15 17:18, Joey Ye wrote:
>> In this case ldm is loading at alignment address. It is just loaded
>> more than sizeof a. So it can be the bus that does not permit
>> accessing memory beyond address range of a. Such a case I don't
>> believe compiler is doing wrong.
>>
>
> If a starts on a 4-byte aligned boundary aligned and is 5 bytes long,
> then a+7 must still be within the same word of memory as the last byte
> of a (similarly for b).  Neither of these addresses can fault, even if
> they are beyond the end of the object itself.  Anyway, such a fault
> would be a segmentation fault, not a bus error.
>
> R.
>
>> On Mon, May 18, 2015 at 4:50 PM, Richard Earnshaw
>>  wrote:
>>> On 18/05/15 10:05, Umesh Kalappa wrote:
>>>> Hi All,
>>>>
>>>> Getting a bus/hard error for the below case ,make sense since ldm/stm
>>>> expects the address to be word aligned .
>>>>
>>>> bash-4.1$ cat test.c
>>>> struct test
>>>> {
>>>> char c;
>>>> int i;
>>>> } __attribute__((packed));
>>>>
>>>> struct test a,b;
>>>>
>>>> int main()
>>>> {
>>>> a =b ; //here compiler is not sure that a or b is word aligned
>>>> return a.i;
>>>> }
>>>>
>>>> bash-4.1$ arm-eabi-gcc -v
>>>> Using built-in specs.
>>>> COLLECT_GCC=arm-eabi-gcc
>>>> COLLECT_LTO_WRAPPER=/nobackup/ukalappa/build/gcc/mv-ga/c4.7.0-p1/x86_64-linux/libexec/gcc/arm-eabi/4.7.0/lto-wrapper
>>>> Target: arm-eabi
>>>> Configured with: /nobackup/ukalappa/src/gcc/mv-ga/gcc/configure
>>>> --srcdir=/nobackup/ukalappa/src/gcc/mv-ga/gcc --build=x86_64-linux
>>>> --target=arm-eabi --host=x86_64-linux
>>>> --prefix=/nobackup/ukalappa/build/gcc/mv-ga/c4.7.0-p1
>>>> --exec-prefix=/nobackup/ukalappa/build/gcc/mv-ga/c4.7.0-p1/x86_64-linux
>>>> --with-pkgversion='Cisco GCC c4.7.0-p1' --with-cisco-patch-level=1
>>>> --with-cisco-patch-level-minor=0
>>>> --with-bugurl=http://wwwin.cisco.com/it/services/
>>>> --disable-maintainer-mode --enable-languages=c,c++ --disable-nls
>>>> Thread model: single
>>>> gcc version 4.7.0
>>>>
>>>> bash-4.1$ ./arm-eabi-gcc -march=armv7 -mthumb  -S test.c
>>>>
>>>> bash-4.1$ cat test.s
>>>> .syntax unified
>>>> .arch armv7
>>>> .fpu softvfp
>>>> .eabi_attribute 20, 1
>>>> .eabi_attribute 21, 1
>>>> .eabi_attribute 23, 3
>>>> .eabi_attribute 24, 1
>>>> .eabi_attribute 25, 1
>>>> .eabi_attribute 26, 1
>>>> .eabi_attribute 30, 6
>>>> .eabi_attribute 34, 1
>>>> .eabi_attribute 18, 4
>>>> .thumb
>>>> .file   "test.c"
>>>> .comm   a,5,4
>>>> .comm   b,5,4
>>>
>>> The above two lines create (common) instances of a and b that are 4-byte
>>> aligned, so the LDM should not be faulting in this case, unless your
>>> binutils have ignored the alignment constraints.
>>>
>>> I don't think the compiler has done the wrong thing here.
>>>
>>> R.
>>>
>>>> .text
>>>> .align  2
>>>> .global main
>>>> .thumb
>>>> .thumb_func
>>>> .type   main, %function
>>>> main:
>>>> @ args = 0, pretend = 0, frame = 0
>>>> @ frame_needed = 1, uses_anonymous_args = 0
>>>> @ link register save eliminated.
>>>> push{r7}
>>>> add r7, sp, #0
>>>> movwr3, #:lower16:a
>>>> movtr3, #:upper16:a
>>>> movwr2, #:lower16:b
>>>> movtr2, #:upper16:b
>>>> ldmia   r2, {r0, r1}   //Bus error
>>>> str r0, [r3]
>>>> addsr3, r3, #4
>>>> strbr1, [r3]
>>>> movwr3, #:lower16:a
>>>> movtr3, #:upper16:a
>>>> ldr r3, [r3, #1]@ unaligned
>>>> mov r0, r3
>>>> mov sp, r7
>>>> pop {r7}
>>>> bx  lr
>>>> .size   main, .-main
>>>>
>>>>
>>>> Arm states that ldm/stm should be word aligned and generating ldm/stm
>>>> in the above case is the compiler error/bug ,do you guys agree with me
>>>> or i'm missing something here ?
>>>>
>>>>
>>>> Thank you
>>>> ~Umesh
>>>>
>>>
>


warning: conversion from ‘int’ to ‘char’ may change value

2018-09-17 Thread Umesh Kalappa
Hi All,

When we try to compile the below case from trunk gcc we get the below
warning (-Wconversion) i.e

void start(void) {
 char n = 1;
 char n1 = 0x01;
 n &=  ~n1;
}

$xgcc -S  warn.c -nostdinc -Wconversion
 warning: conversion from ‘int’ to ‘char’ may change value [-Wconversion]
  n &=  ~n1;

typecast the expression like "n& = (char)~n1" and warning goes away .

and when we investigated the gcc source and warning coming from
unsafe_conversion_p@ gcc/c-family/c-common.c:1226

if (TYPE_PRECISION (type) < TYPE_PRECISION (expr_type))
give_warning = UNSAFE_OTHER;

where TYPE_PRECISION (type) is 8  for char and TYPE_PRECISION
(expr_type) is 32  as expected for int .

is that expected behavior of gcc ?

clang compiles with no warnings .

Thank you
~Umesh


Mips :delay slot filler with store.

2018-10-25 Thread Umesh Kalappa
Hi All,

For the below C code

Test.u32pt = u32PtLen;
Test.u32pn = u32PtCnt;
Test.pstpk = pstPt;
Test.psteo = pstEgrInfo;
Test.e = 1;
Test.pstfi = pstFi ;

return foo(&Test, AclAction);

where "Test" is the struct type .

  the generated code for mips (with -fno-delayed-branch) :

   Test.u32pt = u32PtLen;
   370:afb50084 sw  s5,132(sp)
   Test.u32pn = u32PtCnt;
   374:afb60080 sw  s6,128(sp)
   Test.pstpk = pstPt;
   Test.psteo = pstEgrInfo;
   Test.e = 1;
   Test.pstfi = pstFi ;

   return foo(&Test, AclAction)
   378:0c00 jal 0 
   37c: nop

   with -fdelayed-branch(gcc 4.8.1) the generated code is

 Test.u32pt = u32PtLen;
  370:afb50084 sw  s5,132(sp)
 Test.pstpk = pstPt;
 Test.psteo = pstEgrInfo;
 Test.e = 1;
 Test.pstfi = pstFi ;

  return foo(&Test, AclAction)
  378:0c00 jal 0 
  Test.u32pn = u32PtCnt;
  374:afb60080 sw  s6,128(sp)

  can filler place the "sw  s6,128(sp)"  in the delay slot ,is
that legal and if not why it so  ?

  Thank you
  ~Umesh


Power 64 ELFv2 w.r.t toc(cmodel=medium) on windows.

2018-10-25 Thread Umesh Kalappa
Hi All,

For the below code (test.c)

int foo()
{
  printf("Hello World");
}

On linux :
ccpc -mcpu=e6500 -mno-altivec -mabi=no-altivec -D_WRS_HARDWARE_FP
-mabi=elfv2 -mcmodel=med -mhard-float -S test.c

linux asm :
the constant string fetched like

 addis 3,2,.LC0@toc@ha
 addi 3,3,.LC0@toc@l

where offset  signed 32 bit used  relatively to  toc base  on linux as
expected  for the  medium code model .
and the relocation entry will be generated by gas  :
R_PPC64_TOC16_HA   and  R_PPC64_TOC16_LO

For Windows :

same command  and windows asm looks like

la 3,.LC0@toc(2)

where offset used  signed 16  bit used  relatively to  toc base  why it so ?.
and the relocation entry will be :
R_PPC64_TOC16  (signed 16 bit offset )

why this difference and when we greping the .md file and  we found
patterns (rs6000.md ) like

;; Largetoc support
(define_insn "*largetoc_high"
  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
(high:DI
  (unspec [(match_operand:DI 1 "" "")
   (match_operand:DI 2 "gpc_reg_operand" "b")]
  UNSPEC_TOCREL)))]
   "TARGET_ELF && TARGET_CMODEL != CMODEL_SMALL"
   "addis %0,%2,%1@toc@ha")

(define_insn "*largetoc_high_aix"
  [(set (match_operand:P 0 "gpc_reg_operand" "=b*r")
(high:P
  (unspec [(match_operand:P 1 "" "")
   (match_operand:P 2 "gpc_reg_operand" "b")]
  UNSPEC_TOCREL)))]
   "TARGET_XCOFF && TARGET_CMODEL != CMODEL_SMALL"
   "addis %0,%1@u(%2)")

the above patterns  answered the difference b/w windows and linux .

Questions to the expert  is that using the medium code model ,how we
can get the same linux semantics  on windows (without source code
changes) ?

or above distinguish patterns  for  a reason and which we are missing here  ?

Thank you and awaiting for any comments
~Umesh


Re: Power 64 ELFv2 w.r.t toc(cmodel=medium) on windows.

2018-10-25 Thread Umesh Kalappa
Cced maintainer like David Edelsohn and Segher Boessenkool .

Any suggestions/comments for the below query ?
Thank you
~Umesh
On Thu, Oct 25, 2018 at 9:23 PM Umesh Kalappa  wrote:
>
> Hi All,
>
> For the below code (test.c)
>
> int foo()
> {
>   printf("Hello World");
> }
>
> On linux :
> ccpc -mcpu=e6500 -mno-altivec -mabi=no-altivec -D_WRS_HARDWARE_FP
> -mabi=elfv2 -mcmodel=med -mhard-float -S test.c
>
> linux asm :
> the constant string fetched like
>
>  addis 3,2,.LC0@toc@ha
>  addi 3,3,.LC0@toc@l
>
> where offset  signed 32 bit used  relatively to  toc base  on linux as
> expected  for the  medium code model .
> and the relocation entry will be generated by gas  :
> R_PPC64_TOC16_HA   and  R_PPC64_TOC16_LO
>
> For Windows :
>
> same command  and windows asm looks like
>
> la 3,.LC0@toc(2)
>
> where offset used  signed 16  bit used  relatively to  toc base  why it so ?.
> and the relocation entry will be :
> R_PPC64_TOC16  (signed 16 bit offset )
>
> why this difference and when we greping the .md file and  we found
> patterns (rs6000.md ) like
>
> ;; Largetoc support
> (define_insn "*largetoc_high"
>   [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
> (high:DI
>   (unspec [(match_operand:DI 1 "" "")
>(match_operand:DI 2 "gpc_reg_operand" "b")]
>   UNSPEC_TOCREL)))]
>"TARGET_ELF && TARGET_CMODEL != CMODEL_SMALL"
>"addis %0,%2,%1@toc@ha")
>
> (define_insn "*largetoc_high_aix"
>   [(set (match_operand:P 0 "gpc_reg_operand" "=b*r")
> (high:P
>   (unspec [(match_operand:P 1 "" "")
>(match_operand:P 2 "gpc_reg_operand" "b")]
>   UNSPEC_TOCREL)))]
>"TARGET_XCOFF && TARGET_CMODEL != CMODEL_SMALL"
>"addis %0,%1@u(%2)")
>
> the above patterns  answered the difference b/w windows and linux .
>
> Questions to the expert  is that using the medium code model ,how we
> can get the same linux semantics  on windows (without source code
> changes) ?
>
> or above distinguish patterns  for  a reason and which we are missing here  ?
>
> Thank you and awaiting for any comments
> ~Umesh


Re: Power 64 ELFv2 w.r.t toc(cmodel=medium) on windows.

2018-10-26 Thread Umesh Kalappa
Thank you David for the information.
>>Are you asking about semantics or syntax?  Which source code do you
not want to change?
My bad was not clear in the first go and the questions was why on PE
format the relocation is R_PPC64_TOC16 is generated for global access
and on ELF why its R_PPC64_TOC16_HA and R_PPC64_TOC16_LO  is generated
for same  global access ? why this difference ?

then irrespective of object  file formats ,the RELOCATION should be
same for the given target,right ?

for more info
we are using bintuills (2.29) and we run the assembler like
$as.exe -v -a64 -me6500 -many -mbig -o test.o test.s

lets compiler(gcc)  emit the syntax as PE format ,but we need
assembler emit relocations like ELF for mcmodel=medium /large .,is
that possible .

or like you suggested change the compiler to emit the ELF syntax for
global access ?

Thank you again
~Umesh


On Fri, Oct 26, 2018 at 6:57 PM David Edelsohn  wrote:
>
> On Thu, Oct 25, 2018 at 11:53 AM Umesh Kalappa  
> wrote:
> >
> > Hi All,
> >
> > For the below code (test.c)
> >
> > int foo()
> > {
> >   printf("Hello World");
> > }
> >
> > On linux :
> > ccpc -mcpu=e6500 -mno-altivec -mabi=no-altivec -D_WRS_HARDWARE_FP
> > -mabi=elfv2 -mcmodel=med -mhard-float -S test.c
> >
> > linux asm :
> > the constant string fetched like
> >
> >  addis 3,2,.LC0@toc@ha
> >  addi 3,3,.LC0@toc@l
> >
> > where offset  signed 32 bit used  relatively to  toc base  on linux as
> > expected  for the  medium code model .
> > and the relocation entry will be generated by gas  :
> > R_PPC64_TOC16_HA   and  R_PPC64_TOC16_LO
> >
> > For Windows :
> >
> > same command  and windows asm looks like
> >
> > la 3,.LC0@toc(2)
> >
> > where offset used  signed 16  bit used  relatively to  toc base  why it so 
> > ?.
> > and the relocation entry will be :
> > R_PPC64_TOC16  (signed 16 bit offset )
> >
> > why this difference and when we greping the .md file and  we found
> > patterns (rs6000.md ) like
> >
> > ;; Largetoc support
> > (define_insn "*largetoc_high"
> >   [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
> > (high:DI
> >   (unspec [(match_operand:DI 1 "" "")
> >(match_operand:DI 2 "gpc_reg_operand" "b")]
> >   UNSPEC_TOCREL)))]
> >"TARGET_ELF && TARGET_CMODEL != CMODEL_SMALL"
> >"addis %0,%2,%1@toc@ha")
> >
> > (define_insn "*largetoc_high_aix"
> >   [(set (match_operand:P 0 "gpc_reg_operand" "=b*r")
> > (high:P
> >   (unspec [(match_operand:P 1 "" "")
> >(match_operand:P 2 "gpc_reg_operand" "b")]
> >   UNSPEC_TOCREL)))]
> >"TARGET_XCOFF && TARGET_CMODEL != CMODEL_SMALL"
> >"addis %0,%1@u(%2)")
> >
> > the above patterns  answered the difference b/w windows and linux .
> >
> > Questions to the expert  is that using the medium code model ,how we
> > can get the same linux semantics  on windows (without source code
> > changes) ?
> >
> > or above distinguish patterns  for  a reason and which we are missing here  
> > ?
>
> Linux uses the ELF file format and assembler syntax.  AIX uses the AIX
> file format and assembler syntax.
>
> Windows uses PE file format and syntax, which is not supported in the
> rs6000 or powerpcspe ports.
>
> Are you asking about semantics or syntax?  Which source code do you
> not want to change?  If you want to target PE assembler, GCC needs to
> be taught about that syntax, or at least it needs to generate the ELF
> syntax for Windows PE.
>
> Thanks, David


[PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .

2019-05-15 Thread Umesh Kalappa
Hi All,

We have the situation ,where the R12 is pointing to Thunk GEP ,not the
current function  like

   .size   _ZN12Intermediate1vEv,.-_ZN12Intermediate1vEv
.set.LTHUNK0,_ZN12Intermediate1vEv
.align 2
.globl _ZThn8_N12Intermediate1vEv
.type   _ZThn8_N12Intermediate1vEv, @function
_ZThn8_N12Intermediate1vEv:
.LFB27:
.file 2 "/home/vkumar1/tmp/64_bit/qsp_ppc/gnu/dkmcxx/lib.h"
.loc 2 13 16
.cfi_startproc
.LCF2:
0:  addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
.localentry _ZThn8_N12Intermediate1vEv,.-_ZThn8_N12Intermediate1vEv
addi 3,3,-8
b .LTHUNK0
.cfi_endproc
.LFE27:
.size   _ZThn8_N12Intermediate1vEv,.-_ZThn8_N12Intermediate1vEv
.section".toc","aw"
.set .LC1,.LC0

.section".text"
.align 2
.globl _ZN12Intermediate1vEv
.type   _ZN12Intermediate1vEv, @function
_ZN12Intermediate1vEv:
.LFB25:
.loc 1 7 23
.cfi_startproc
.LCF1:
0:  addis 2,12,.TOC.-.LCF1@ha
addi 2,2,.TOC.-.LCF1@l
.localentry _ZN12Intermediate1vEv,.-_ZN12Intermediate1vEv
mflr 0
std 0,16(1)
std 31,-8(1)
stdu 1,-64(1)

like above the control  from "_ZThn8_N12Intermediate1vEv" (support
function for this pointer update)  is transferred
"_ZN12Intermediate1vEv" by b  inst (where its not updating the r12)
and in the beginning  of   "_ZN12Intermediate1vEv" we are loading the
toc base from r12 (which is incorrect ) ,we are investigating the
issue and one way to fix the issue is that make THUNK to update the
r12 ,the cal like bctrl  or  load the r12 with the function address in
the _ZN12Intermediate1vEv prologue code .

But before we go ahead ,please share your thoughts or shed some lights
on the same .

Thank you
~Umesh


Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .

2019-05-15 Thread Umesh Kalappa
Thank you Eric for the suggestion and say that we support in the  loader
part ,can you please point on elfv2 reference that says implementation for
this specific case.

~Umesh

On Wed, May 15, 2019, 21:35 Eric Botcazou  wrote:

> > like above the control  from "_ZThn8_N12Intermediate1vEv" (support
> > function for this pointer update)  is transferred
> > "_ZN12Intermediate1vEv" by b  inst (where its not updating the r12)
> > and in the beginning  of   "_ZN12Intermediate1vEv" we are loading the
> > toc base from r12 (which is incorrect ) ,we are investigating the
> > issue and one way to fix the issue is that make THUNK to update the
> > r12 ,the cal like bctrl  or  load the r12 with the function address in
> > the _ZN12Intermediate1vEv prologue code .
>
> Is that on VxWorks in kernel mode?  If so, the loader doesn't abide by the
> ELFv2 ABI so the simple way out is to disable asm thunks altogether:
>
> #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
> #define TARGET_ASM_CAN_OUTPUT_MI_THUNK rs6000_can_output_mi_thunk
>
> /* Return true if rs6000_output_mi_thunk would be able to output the
>assembler code for the thunk function specified by the arguments
>it is passed, and false otherwise.  */
>
> static bool
> rs6000_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT,
> const_tree)
> {
>   /* The only possible issue is for VxWorks in kernel mode.  */
>   if (!TARGET_VXWORKS || TARGET_VXWORKS_RTP)
> return true;
>
>   /* The loader neither creates the glue code sequence that loads r12 nor
> uses
>  the local entry point for the sibcall's target in the ELFv2 ABI.  */
>   return DEFAULT_ABI != ABI_ELFv2;
> }
>
> --
> Eric Botcazou
>


Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .

2019-05-15 Thread Umesh Kalappa
>>Can't you get the loader fixed, instead?
Yes we are thinking the same ,question what should be loader semantics here
(update the prologue code to update r12 to Global Entry Point or Update R2
with toc base (that don't relay on the R12). )

~Umesh
On Thu, May 16, 2019, 05:22 Segher Boessenkool 
wrote:

> On Wed, May 15, 2019 at 08:31:27PM +0200, Eric Botcazou wrote:
> > > Thank you Eric for the suggestion and say that we support in the
> loader
> > > part ,can you please point on elfv2 reference that says implementation
> for
> > > this specific case.
> >
> > I don't think there is a reference to this specific case in the ABI,
> only the
> > general stuff about local and global entry points.
>
> Yes, it is just a normal jump to a local function.
>
> > We have had this patch in
> > our tree for some time and it works well, so let me submit it for
> inclusion in
> > the official tree.
>
> Can't you get the loader fixed, instead?
>
>
> Segher
>


Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .

2019-05-16 Thread Umesh Kalappa
Hi Segher and ERic

We are very new to Power abi and we are thinking to handle this case
in loader  like  go through the  relocations like R_PPC64_REL24 and
found symbol has the localentry ,then compute the delta (GEP - LEP )
and patch the caller address like (sym.value - delta).

Thank you
~Umesh

On Thu, May 16, 2019 at 9:00 AM Umesh Kalappa  wrote:
>
> >>Can't you get the loader fixed, instead?
> Yes we are thinking the same ,question what should be loader semantics here 
> (update the prologue code to update r12 to Global Entry Point or Update R2 
> with toc base (that don't relay on the R12). )
>
> ~Umesh
> On Thu, May 16, 2019, 05:22 Segher Boessenkool  
> wrote:
>>
>> On Wed, May 15, 2019 at 08:31:27PM +0200, Eric Botcazou wrote:
>> > > Thank you Eric for the suggestion and say that we support in the  loader
>> > > part ,can you please point on elfv2 reference that says implementation 
>> > > for
>> > > this specific case.
>> >
>> > I don't think there is a reference to this specific case in the ABI, only 
>> > the
>> > general stuff about local and global entry points.
>>
>> Yes, it is just a normal jump to a local function.
>>
>> > We have had this patch in
>> > our tree for some time and it works well, so let me submit it for 
>> > inclusion in
>> > the official tree.
>>
>> Can't you get the loader fixed, instead?
>>
>>
>> Segher


Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .

2019-05-16 Thread Umesh Kalappa
Hi Segher,

Please refer the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90513 .

Thank you
~Umesh

On Fri, May 17, 2019 at 4:22 AM Segher Boessenkool
 wrote:
>
> Hi Umesh,
>
> On Thu, May 16, 2019 at 06:12:48PM +0530, Umesh Kalappa wrote:
> > We are very new to Power abi and we are thinking to handle this case
> > in loader  like  go through the  relocations like R_PPC64_REL24 and
> > found symbol has the localentry ,then compute the delta (GEP - LEP )
> > and patch the caller address like (sym.value - delta).
>
> I wonder if you have found a bug in the compiler after all.  Most things
> are supposed to work without the linker/loader having to do special
> things; e.g. using the global entry point should always work, using the
> local entry point is just an optimisation.
>
> Please open a PR so we can investigate?
>
>
> Segher


Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-05-04 Thread Umesh Kalappa
Hi Alex ,

Agree that float division don't touch memory ,but fdiv  result (stack
register ) is stored  back to a memory i.e fResult .

So compiler barrier in the inline asm i.e ::memory should prevent the
shrinkage of  instructions like  "fstps   fResult(%rip)"behind the
fence ?

BTW ,if we make fDivident  and  fResult = 0.0f  gloabls,the code
emitted looks ok  i.e
#gcc -S test.c -O3  -mmmx -mno-sse

   flds.LC0(%rip)
fstsfDivident(%rip)
fdivs   .LC1(%rip)
fstps   fResult(%rip)
#APP
# 10 "test.c" 1
mfence
# 0 "" 2
#NO_APP
fldsfResult(%rip)
movl$.LC2, %edi
xorl%eax, %eax
fstpl   (%rsp)
callprintf

So i strongly believe that ,its compiler issue and please feel free
correct me in any case.

Thank you and waiting for your reply.

~Umesh




On Fri, Apr 13, 2018 at 5:58 PM, Alexander Monakov  wrote:
> On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
>> The mfence instruction with memory clobber asm instruction should create a
>> barrier between division and printf instructions.
>
> No, floating-point division does not touch memory, so the asm does not (and
> need not) restrict its motion.
>
> Alexander


Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-05-07 Thread Umesh Kalappa
CCed Jakub,


> Hi Alex,
> Agree that float division don't touch memory ,but fdiv  result (stack
> register ) is stored  back to a memory i.e fResult .
>
> So compiler barrier in the inline asm i.e ::memory should prevent the
> shrinkage of  instructions like  "fstps   fResult(%rip)"behind the
> fence ?
>
> BTW ,if we make fDivident  and  fResult = 0.0f  gloabls,the code
> emitted looks ok  i.e
> #gcc -S test.c -O3  -mmmx -mno-sse
>
>flds.LC0(%rip)
> fstsfDivident(%rip)
> fdivs   .LC1(%rip)
> fstps   fResult(%rip)
> #APP
> # 10 "test.c" 1
> mfence
> # 0 "" 2
> #NO_APP
> fldsfResult(%rip)
> movl$.LC2, %edi
> xorl%eax, %eax
> fstpl   (%rsp)
> callprintf
>
> So i strongly believe that ,its compiler issue and please feel free
> correct me in any case.
>
> Thank you and waiting for your reply.
>
> ~Umesh
>
>
>
>
> On Fri, Apr 13, 2018 at 5:58 PM, Alexander Monakov  wrote:
>> On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
>>> The mfence instruction with memory clobber asm instruction should create a
>>> barrier between division and printf instructions.
>>
>> No, floating-point division does not touch memory, so the asm does not (and
>> need not) restrict its motion.
>>
>> Alexander


Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-05-28 Thread Umesh Kalappa
Ok, thanks for the clarification jakub.

Umesg

On Mon, May 7, 2018, 2:08 PM Jakub Jelinek  wrote:

> On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote:
> > CCed Jakub,
>
> > > Agree that float division don't touch memory ,but fdiv  result (stack
> > > register ) is stored  back to a memory i.e fResult .
>
> That doesn't really matter.  It is stored to a stack spill slot, something
> that doesn't have address taken and other code (e.g. in other threads)
> can't
> in a valid program access it.  That is not considered memory for the
> inline-asm, only objects that must live in memory count.
>
> Jakub
>


GCC 8.1 :Store Merge pass issue (-fstore-merging).

2018-07-11 Thread Umesh Kalappa
Hi Everyone ,

We have the below case ,where store marge pass doing the invalid
optimization (thats our observations on powerpc ) ,i.e

C case :

typedef unsigned int UINT32;

typedef union
{
UINT32 regVal;
struct
{
UINT32 mask:1;
UINT32 a:1;
UINT32 :6;
UINT32 p:1;
UINT32 s:1;
UINT32 :2;
UINT32 priority:4;
UINT32 vector:16;
} field;
} MPIC_IVPR;

UINT32 test(UINT32 vector)
{
MPIC_IVPR   mpicIvpr;

mpicIvpr.regVal = 0;
mpicIvpr.field.vector = vector;
mpicIvpr.field.priority = 0xe;

return mpicIvpr.regVal;
}


  gcc -O2 -S   test.c

  ...
lis 3,0xe   ;; mpicIvpr.field.priority = 15
blr
  ...

the store dump as

Processing basic block <2>:
Starting new chain with statement:
mpicIvpr.regVal = 0;
The base object is:
&mpicIvpr
Recording immediate store from stmt:
mpicIvpr.field.vector = _1;
Recording immediate store from stmt:
mpicIvpr.field.priority = 14;
stmt causes chain termination:
_7 = mpicIvpr.regVal;
Attempting to coalesce 3 stores in chain.
Store 0:
bitsize:32 bitpos:0 val:
0

Store 1:
bitsize:4 bitpos:12 val:
14

Store 2:
bitsize:16 bitpos:16 val:
_1


After writing 0 of size 32 at position 0 the merged region contains:
0 0 0 0 0 0 0 0
After writing 14 of size 4 at position 12 the merged region contains:
0 e 0 0 0 0 0 0
Coalescing successful!
Merged into 1 stores
New sequence of 1 stmts to replace old one of 2 stmts
# .MEM_6 = VDEF <.MEM_5>
MEM[(union  *)&mpicIvpr] = 917504;
Merging successful!
Volatile access terminates all chains
test (UINT32 vector)
{
  union MPIC_IVPR mpicIvpr;
  short unsigned int _1;
  UINT32 _7;

   [local count: 1073741825]:
  _1 = (short unsigned int) vector_4(D);
  mpicIvpr.field.vector = _1;
  MEM[(union  *)&mpicIvpr] = 917504;
  _7 = mpicIvpr.regVal;
  mpicIvpr ={v} {CLOBBER};
  return _7;

}


As noticed  from dump ,the store of  .regVal and  priority is folded
to single store ,since the rhs operand is constant in both stmts by
leaving the  above cfg and making  mpicIvpr.field.vector = _1 stmt as
dead code,hence latter DCE deletes the same ,which results with  the
incorrect asm as show above and we are in process of debugging the
store merge pass and we see that "  mpicIvpr.field.vector = _1; "
should clobber the first store "mpicIvpr.regVal = 0;" in the merge
store vector ,but its not clobbering ,by disabling the handling the
overlapping store ,the above case works , i.e
if(0)
{
 /* |---store 1---|
   |---store 2---|
 Overlapping stores.  */
  if (IN_RANGE (info->bitpos, merged_store->start,
merged_store->start + merged_store->width - 1))
{
  if (info->rhs_code == INTEGER_CST
  && merged_store->stores[0]->rhs_code == INTEGER_CST)
{
  merged_store->merge_overlapping (info);
  continue;
}
}

}

before we conclude on the same ,we would like to hear any comments
from community ,which helps us to resolve the issue  and by disabling
the store merge pass as expected the above case works .

Thank you. and looking for any suggestions on the same.
~Umesh


Fwd: GCC 8.1 :Store Merge pass issue (-fstore-merging).

2018-07-11 Thread Umesh Kalappa
Cc'ed Kyrill.


-- Forwarded message -
From: Umesh Kalappa 
Date: Wed, Jul 11, 2018, 7:37 PM
Subject: GCC 8.1 :Store Merge pass issue (-fstore-merging).
To: 


Hi Everyone ,

We have the below case ,where store marge pass doing the invalid
optimization (thats our observations on powerpc ) ,i.e

C case :

typedef unsigned int UINT32;

typedef union
{
UINT32 regVal;
struct
{
UINT32 mask:1;
UINT32 a:1;
UINT32 :6;
UINT32 p:1;
UINT32 s:1;
UINT32 :2;
UINT32 priority:4;
UINT32 vector:16;
} field;
} MPIC_IVPR;

UINT32 test(UINT32 vector)
{
MPIC_IVPR   mpicIvpr;

mpicIvpr.regVal = 0;
mpicIvpr.field.vector = vector;
mpicIvpr.field.priority = 0xe;

return mpicIvpr.regVal;
}


  gcc -O2 -S   test.c

  ...
lis 3,0xe   ;; mpicIvpr.field.priority = 15
blr
  ...

the store dump as

Processing basic block <2>:
Starting new chain with statement:
mpicIvpr.regVal = 0;
The base object is:
&mpicIvpr
Recording immediate store from stmt:
mpicIvpr.field.vector = _1;
Recording immediate store from stmt:
mpicIvpr.field.priority = 14;
stmt causes chain termination:
_7 = mpicIvpr.regVal;
Attempting to coalesce 3 stores in chain.
Store 0:
bitsize:32 bitpos:0 val:
0

Store 1:
bitsize:4 bitpos:12 val:
14

Store 2:
bitsize:16 bitpos:16 val:
_1


After writing 0 of size 32 at position 0 the merged region contains:
0 0 0 0 0 0 0 0
After writing 14 of size 4 at position 12 the merged region contains:
0 e 0 0 0 0 0 0
Coalescing successful!
Merged into 1 stores
New sequence of 1 stmts to replace old one of 2 stmts
# .MEM_6 = VDEF <.MEM_5>
MEM[(union  *)&mpicIvpr] = 917504;
Merging successful!
Volatile access terminates all chains
test (UINT32 vector)
{
  union MPIC_IVPR mpicIvpr;
  short unsigned int _1;
  UINT32 _7;

   [local count: 1073741825]:
  _1 = (short unsigned int) vector_4(D);
  mpicIvpr.field.vector = _1;
  MEM[(union  *)&mpicIvpr] = 917504;
  _7 = mpicIvpr.regVal;
  mpicIvpr ={v} {CLOBBER};
  return _7;

}


As noticed  from dump ,the store of  .regVal and  priority is folded
to single store ,since the rhs operand is constant in both stmts by
leaving the  above cfg and making  mpicIvpr.field.vector = _1 stmt as
dead code,hence latter DCE deletes the same ,which results with  the
incorrect asm as show above and we are in process of debugging the
store merge pass and we see that "  mpicIvpr.field.vector = _1; "
should clobber the first store "mpicIvpr.regVal = 0;" in the merge
store vector ,but its not clobbering ,by disabling the handling the
overlapping store ,the above case works , i.e
if(0)
{
 /* |---store 1---|
   |---store 2---|
 Overlapping stores.  */
  if (IN_RANGE (info->bitpos, merged_store->start,
merged_store->start + merged_store->width - 1))
{
  if (info->rhs_code == INTEGER_CST
  && merged_store->stores[0]->rhs_code == INTEGER_CST)
{
  merged_store->merge_overlapping (info);
  continue;
}
}

}

before we conclude on the same ,we would like to hear any comments
from community ,which helps us to resolve the issue  and by disabling
the store merge pass as expected the above case works .

Thank you. and looking for any suggestions on the same.
~Umesh


Re: Fwd: GCC 8.1 :Store Merge pass issue (-fstore-merging).

2018-07-11 Thread Umesh Kalappa
Thank you Jakub and my bad sure next time.

Umesh

On Wed, Jul 11, 2018, 10:17 PM Jakub Jelinek  wrote:

> On Wed, Jul 11, 2018 at 09:48:07PM +0530, Umesh Kalappa wrote:
> > Cc'ed Kyrill.
>
> Mailing list is not the right medium to report bugs.
> I've filed http://gcc.gnu.org/PR86492 for you, but please next time use
> bugzilla.
>
> Jakub
>


Re: Fwd: GCC 8.1 :Store Merge pass issue (-fstore-merging).

2018-07-12 Thread Umesh Kalappa
Thank you Jakub ,the attached patch in the PR86492 fixes the issue.

Appreciate your quick response here .

~Umesh

On Wed, Jul 11, 2018 at 10:17 PM, Jakub Jelinek  wrote:
> On Wed, Jul 11, 2018 at 09:48:07PM +0530, Umesh Kalappa wrote:
>> Cc'ed Kyrill.
>
> Mailing list is not the right medium to report bugs.
> I've filed http://gcc.gnu.org/PR86492 for you, but please next time use
> bugzilla.
>
> Jakub


Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-12 Thread Umesh Kalappa
Hi everyone,

we have our source base ,that was compiled for armv7 on gcc8.1 with
soft-float and for following input

a=0x0010
b=0x0001

 result = a - b ;

we are getting the result as "0x000e" and with
-mhard-float (disabled the flush to zero mode ) we are getting the
result as ""0x000f" as expected.

while debugging the soft-float code,we see that ,the compiler calls
the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
"a" in r0 and r1 registers and respectively for "b".

we are investigating the routine "__aeabi_dsub" that comes from libgcc
for incorrect result  and meanwhile we would like to know that

a)do libgcc routines/intrinsic for float operations support or
consider the subnormal values ? ,if so how we can enable the same.

Thank you
~Umesh


Re: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-13 Thread Umesh Kalappa
Thank you and issue  raised at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512

~Umesh

On Thu, Jul 12, 2018 at 9:33 PM, Szabolcs Nagy  wrote:
> On 12/07/18 16:20, Umesh Kalappa wrote:
>>
>> Hi everyone,
>>
>> we have our source base ,that was compiled for armv7 on gcc8.1 with
>> soft-float and for following input
>>
>> a=0x0010
>> b=0x0001
>>
>>   result = a - b ;
>>
>> we are getting the result as "0x000e" and with
>> -mhard-float (disabled the flush to zero mode ) we are getting the
>> result as ""0x000f" as expected.
>>
>
> please submit it as a bug report to bugzilla
>
>
>> while debugging the soft-float code,we see that ,the compiler calls
>> the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
>> "a" in r0 and r1 registers and respectively for "b".
>>
>> we are investigating the routine "__aeabi_dsub" that comes from libgcc
>> for incorrect result  and meanwhile we would like to know that
>>
>> a)do libgcc routines/intrinsic for float operations support or
>> consider the subnormal values ? ,if so how we can enable the same.
>>
>> Thank you
>> ~Umesh
>>
>


Re: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-17 Thread Umesh Kalappa
Hi Nagy,

Please  help us with your comments on the attached patch for the issue
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512)

Thank you and waiting for your inputs on the same.
~Umesh

On Fri, Jul 13, 2018 at 1:22 PM, Umesh Kalappa  wrote:
> Thank you and issue  raised at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512
>
> ~Umesh
>
> On Thu, Jul 12, 2018 at 9:33 PM, Szabolcs Nagy  wrote:
>> On 12/07/18 16:20, Umesh Kalappa wrote:
>>>
>>> Hi everyone,
>>>
>>> we have our source base ,that was compiled for armv7 on gcc8.1 with
>>> soft-float and for following input
>>>
>>> a=0x0010
>>> b=0x0001
>>>
>>>   result = a - b ;
>>>
>>> we are getting the result as "0x000e" and with
>>> -mhard-float (disabled the flush to zero mode ) we are getting the
>>> result as ""0x000f" as expected.
>>>
>>
>> please submit it as a bug report to bugzilla
>>
>>
>>> while debugging the soft-float code,we see that ,the compiler calls
>>> the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
>>> "a" in r0 and r1 registers and respectively for "b".
>>>
>>> we are investigating the routine "__aeabi_dsub" that comes from libgcc
>>> for incorrect result  and meanwhile we would like to know that
>>>
>>> a)do libgcc routines/intrinsic for float operations support or
>>> consider the subnormal values ? ,if so how we can enable the same.
>>>
>>> Thank you
>>> ~Umesh
>>>
>>


86512.patch
Description: Binary data


Re: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-17 Thread Umesh Kalappa
Will do, thanks.
Thanks

On Tue, Jul 17, 2018, 3:24 PM Ramana Radhakrishnan <
ramana@googlemail.com> wrote:

> On Tue, Jul 17, 2018 at 10:41 AM, Umesh Kalappa
>  wrote:
> > Hi Nagy,
> >
> > Please  help us with your comments on the attached patch for the issue
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512)
> >
> > Thank you and waiting for your inputs on the same.
>
>
> Patches should be sent to gcc-patc...@gcc.gnu.org with a clear
> description of what the patch hopes to
> achieve and why this is correct, how was it tested and if a regression
> test needs to be added - add one please.
> Please read https://gcc.gnu.org/contribute.html before sending a patch.
>
> This is the wrong list to send patches to.
>
> regards
> Ramana
> > ~Umesh
> >
> > On Fri, Jul 13, 2018 at 1:22 PM, Umesh Kalappa 
> wrote:
> >> Thank you and issue  raised at
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512
> >>
> >> ~Umesh
> >>
> >> On Thu, Jul 12, 2018 at 9:33 PM, Szabolcs Nagy 
> wrote:
> >>> On 12/07/18 16:20, Umesh Kalappa wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> we have our source base ,that was compiled for armv7 on gcc8.1 with
> >>>> soft-float and for following input
> >>>>
> >>>> a=0x0010
> >>>> b=0x0001
> >>>>
> >>>>   result = a - b ;
> >>>>
> >>>> we are getting the result as "0x000e" and with
> >>>> -mhard-float (disabled the flush to zero mode ) we are getting the
> >>>> result as ""0x000f" as expected.
> >>>>
> >>>
> >>> please submit it as a bug report to bugzilla
> >>>
> >>>
> >>>> while debugging the soft-float code,we see that ,the compiler calls
> >>>> the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
> >>>> "a" in r0 and r1 registers and respectively for "b".
> >>>>
> >>>> we are investigating the routine "__aeabi_dsub" that comes from libgcc
> >>>> for incorrect result  and meanwhile we would like to know that
> >>>>
> >>>> a)do libgcc routines/intrinsic for float operations support or
> >>>> consider the subnormal values ? ,if so how we can enable the same.
> >>>>
> >>>> Thank you
> >>>> ~Umesh
> >>>>
> >>>
>


O2 Agressive Optimisation by GCC

2018-07-20 Thread Umesh Kalappa
Hi All ,

We are looking at the C sample i.e

extern int i,j;

int test()
{
while(1)
{   i++;
j=20;
}
return 0;
}

command used :(gcc 8.1.0)
gcc -S test.c -O2

the generated asm for x86

.L2:
jmp .L2

we understand that,the infinite loop is not  deterministic ,compiler
is free to treat as that as UB and do aggressive optimization ,but we
need keep the side effects like j=20 untouched by optimization .

Please note that using the volatile qualifier for i and j  or empty
asm("") in the while loop,will stop the optimizer ,but we don't want
do  that.

Anyone from the community ,please share their insights why above
transformation is right ?

and without using volatile or memory barrier ,how we can stop the
above transformation .


Thank you in advance.
~Umesh


Re: O2 Agressive Optimisation by GCC

2018-07-22 Thread Umesh Kalappa
Allan ,
>>he might as well go traditional

you mean using the locks ?

Thank you
~Umesh

On Sat, Jul 21, 2018 at 4:20 AM, Allan Sandfeld Jensen
 wrote:
> On Samstag, 21. Juli 2018 00:21:48 CEST Jonathan Wakely wrote:
>> On Fri, 20 Jul 2018 at 23:06, Allan Sandfeld Jensen wrote:
>> > On Freitag, 20. Juli 2018 14:19:12 CEST Umesh Kalappa wrote:
>> > > Hi All ,
>> > >
>> > > We are looking at the C sample i.e
>> > >
>> > > extern int i,j;
>> > >
>> > > int test()
>> > > {
>> > > while(1)
>> > > {   i++;
>> > >
>> > > j=20;
>> > >
>> > > }
>> > > return 0;
>> > > }
>> > >
>> > > command used :(gcc 8.1.0)
>> > > gcc -S test.c -O2
>> > >
>> > > the generated asm for x86
>> > >
>> > > .L2:
>> > > jmp .L2
>> > >
>> > > we understand that,the infinite loop is not  deterministic ,compiler
>> > > is free to treat as that as UB and do aggressive optimization ,but we
>> > > need keep the side effects like j=20 untouched by optimization .
>> > >
>> > > Please note that using the volatile qualifier for i and j  or empty
>> > > asm("") in the while loop,will stop the optimizer ,but we don't want
>> > > do  that.
>> >
>> > But you need to do that! If you want changes to a variable to be
>> > observable in another thread, you need to use either volatile,
>>
>> No, volatile doesn't work for that.
>>
> It does, but you shouldn't use for that due to many other reasons (though the
> linux kernel still does) But if the guy wants to code primitive without using
> system calls or atomics, he might as well go traditional
>
> 'Allan
>
>


Re: O2 Agressive Optimisation by GCC

2018-07-22 Thread Umesh Kalappa
Hi Richard,

making i unsigned still  the  optimization is effective ,no luck.
and yes test() is the threaded  routine and since i and j are global
,we need the side effects take place like assignment etc ,that are
observed by other threads .

By making volatile or thread safe or atomic operations ,the
optimization inhibited ,but still we  didn't  get  why its valid
optimization for UB and tried with -fno-strict-overflow too ,no luck
here .

Jakub and anyone can we inhibit these kind optimizations,that consider
the UB and optimize .

Thank you
~Umesh

On Fri, Jul 20, 2018 at 11:47 PM, Richard Biener
 wrote:
> On July 20, 2018 7:59:10 PM GMT+02:00, Martin Sebor  wrote:
>>On 07/20/2018 06:19 AM, Umesh Kalappa wrote:
>>> Hi All ,
>>>
>>> We are looking at the C sample i.e
>>>
>>> extern int i,j;
>>>
>>> int test()
>>> {
>>> while(1)
>>> {   i++;
>>> j=20;
>>> }
>>> return 0;
>>> }
>>>
>>> command used :(gcc 8.1.0)
>>> gcc -S test.c -O2
>>>
>>> the generated asm for x86
>>>
>>> .L2:
>>> jmp .L2
>>>
>>> we understand that,the infinite loop is not  deterministic ,compiler
>>> is free to treat as that as UB and do aggressive optimization ,but we
>>> need keep the side effects like j=20 untouched by optimization .
>>>
>>> Please note that using the volatile qualifier for i and j  or empty
>>> asm("") in the while loop,will stop the optimizer ,but we don't want
>>> do  that.
>>>
>>> Anyone from the community ,please share their insights why above
>>> transformation is right ?
>>
>>The loop isn't necessarily undefined (and compilers don't look
>>for undefined behavior as opportunities to optimize code), but
>
> The variable i overflows.
>
>>because it doesn't terminate it's not possible for a conforming
>>C program to detect the side-effects in its body.  The only way
>>to detect it is to examine the object code as you did.
>
> I'm not sure we perform this kind of dead code elimination but yes, we could. 
> Make i unsigned and check whether that changes behavior.
>
>>Compilers are allowed (and expected) to transform source code
>>into efficient object code as long as the transformations don't
>>change the observable effects of the program.  That's just what
>>happens in this case.
>>
>>Martin
>


Invalid store semantics

2013-09-30 Thread Umesh Kalappa
Dear All,

I'm looking up the below problem in our private backend.

During the RTL expansion the below rtl has been emitted..

(insn 6 5 7 (set (reg:SI 23)
(const_int 10 [0xa])) algt_001.c:41 -1
(nil))

(insn 7 6 8 (set (reg:SI 24)
(unspec:SI [
(mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] [2 lsucCnt2+0 S4 A16])
] 1)) algt_001.c:41 -1
(nil))

(insn 8 7 0 (set (mem:SI (plus:SI (reg:SI 24)
(const_int 0 [0])) [0 S4 A16])
(reg:SI 23)) algt_001.c:41 -1
(nil))

With the optimisation (-O3) enabled ,the above rtl has been transformed to

(insn 7 6 8 2 (set (reg:SI 24)
(unspec:SI [
(mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] ) [2
lsucCnt2+0 S4 A16])
] 1)) algt_001.c:41 59 {tx03_movw}
(nil))

(insn 8 7 0 2 (set (mem:SI (reg:SI 24) [0 S4 A16])
(const_int 10 [0xa])) algt_001.c:41 42 {storesi}
(expr_list:REG_DEAD (reg:SI 24)
(nil)))


Where insn-6 has been deleted and constant 10 is propagated to insn 8
an d finally ended emitting instruction like str 10 ,[mem] ,which is
invalid syntax for store where constant is not allowed.

I'm trying to handle the above problem ,by introducing scratch
register in the store template and peephole/split it ,where force the
constant to the scratch register. Before I do the same.

Would like to know the proposed solution is do able or there exist any
feasible solution out there ???

Looking for some suggestions here

Thanks
~Umesh


cortex-m3(gcc.4.6.3)

2013-10-09 Thread Umesh Kalappa
Dear Group,

The below asm is generated for target  cortex-m3 (gcc-4.6.3)

main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
push {r3, r4, r5, lr}
bl  vAlgTNoOptimize
movs r0, #170
bl  vFnCall
bl  vAlgTOptimize
ldr r4, .L22
add r5, r4, #88
.L20:
ldr r1, [r4, #4]!
movs r0, #2
ldr r2, [r4, #88]
bl  ucCheckResult
cbnz r0, .L19
cmp r4, r5
bne .L20
ldr r0, .L22+4
ldr r1, .L22+8
pop {r3, r4, r5, lr}
b   printf
.L19:
ldr r0, .L22+12
ldr r1, .L22+8
pop {r3, r4, r5, lr}
b   printf

>From the   main  prologue ,we can see that the instruction  " push
{r3, r4, r5, lr} " ,where we are saving r3-r5 and lr and restoring the
same in the epilogue,saving r4,r5 (callee save ) and
lr(call_used_register) make sense here w.r.t ARM EABI,But i didn't get
why we save r3 here and where we don't use r3 in the function ???.

Or i'm missing something here  ???

Thanks
~Umesh


GCC retargeting

2013-10-09 Thread Umesh Kalappa
Dear Group ,

We are re-targeting the GCC to the CISC target ,which  has the eight
8-bit registers  and same register set can used as  pair register for
16 bit computation  i.e four  16-bits .

Any one in the group tell me ,How do i  model this requirement using
the target macros like

REG_CLASS_NAMES and REG_CLASS_CONTENTS etc.


Thanks
~Umesh


Re: GCC retargeting

2013-10-10 Thread Umesh Kalappa
Dear Paul ,

Thanks for the inputs and yeah i looked on them and my basic query remains as

Target we are porting has registers like  A and B of 8 bits and  used in asm  as

ld A,0xff
ld B,0xff

where  A and B has value 0xff and AB can be used as pair like 16 bits as

ld AB,0xeeff

where  A has value oxff  and B has value 0xee.

I model the above registers as

enum reg_class { NO_REGS,A_REGS, B_REGS, AB_REGS, GEN_REGS, ALL_REGS,
LIM_REG_CLASSES };

#define REG_CLASS_CONTENTS {{0}, {0x1}, {0x2}, {0x3}, {0x3}, {0x3}}

#define REG_CLASS_NAMES {"NO_REGS",
  "A_REGS",  /* for A 8 bit register */
  "B_REGS", /* for  B 8 bit register */
  "AB_REGS", /* for AB pair  16 bit register */
  "GEN_REGS", /* for A  , B  8 bits   or AB  pair16 bit registers */
 "ALL_REGS" }   /*for A  , B  8 bits   or AB  pair16 bit registers*/
#define N_REG_CLASSES (int) LIM_REG_CLASSES

#define REGISTER_NAMES \
{"a", "b", "ab" }

 i'm not sure i can model reg names like a,b and ab ,where my target
has only  2 register but i named them with names as a,b and ab
,Please correct me and i'm  really pardon me for bothering you guys
here.

Thanks  and waiting for someone from expert group  to through some lights here
~Umesh


On Wed, Oct 9, 2013 at 7:40 PM,  wrote:
>
>
> On Oct 9, 2013, at 5:24 AM, Umesh Kalappa  wrote:
>
> > Dear Group ,
> >
> > We are re-targeting the GCC to the CISC target ,which  has the eight
> > 8-bit registers  and same register set can used as  pair register for
> > 16 bit computation  i.e four  16-bits .
> >
> > Any one in the group tell me ,How do i  model this requirement using
> > the target macros like
> >
> > REG_CLASS_NAMES and REG_CLASS_CONTENTS etc.
> >
> >
> > Thanks
> > ~Umesh
>
> There probably are other examples, but one you could look at is pdp11, which 
> has 16 bit registers that also can be used in even/odd pairs for 32 bit 
> operations.
>
> paul


GCC IRA support for Register Banks.

2013-10-11 Thread Umesh Kalappa
Dear All,

Did gcc provide any hook to support register bank like

Our private target has two banks of register file like A and B
registers under  Bank-0 and A and B (same name weired ha  ) registers
under Bank-1.

asm sample like

load A ,mem-0//By default the register referred from bank-0
load B ,mem-1
add A,B

set b1 // select bank-1
load A, mem1-0
load B ,mem1-1
add A,B
store A mem1-0
unset b1//revert it back to bank-0

load B mem1-0   //again the register referred from bank-0
add A,B

ret

Can we represent above reg bank requirement  ???,Please let me know if
its so .

Thanks
~Umesh


Re: function attributes

2013-10-16 Thread Umesh Kalappa
You still stuck with this issue ???

~Umesh

On Tue, Oct 15, 2013 at 9:08 PM, Ian Lance Taylor  wrote:
> On Tue, Oct 15, 2013 at 8:04 AM, Nagaraju Mekala  
> wrote:
>>  Hi Ian,
>>
>>   Thanks for the reply.
>>
>> On Fri, Oct 11, 2013 at 10:31 PM, Ian Lance Taylor  wrote:
>>> On Fri, Oct 11, 2013 at 9:20 AM, Nagaraju Mekala  
>>> wrote:

 I observed that in rs6000 port longcall is implemented by using
 CALL_LONG define.
 #define CALL_LONG 0x0008 /* always call indirect */
 In the md file they are checking the operand with CALL_LONG
 if (INTVAL (operands[3]) & CALL_LONG)
 operands[1] = rs6000_longcall_ref (operands[1]);
 In my port I dont have suchthing to compare. Can we somehow parse the
 tree chain and check the attributes of the functions..
>>>
>>> Look at init_cumulative_args in rs6000.c to see how CALL_LONG is set
>>> based on the function attribute.
>>
>> I was able to get the function attribute from the init_cumulative_args
>> function.  I have used the fndecl tree to get the attribute details
>> but I have failed to stop generating br instruction. It should print
>> bk instruction.
>> I was unable to relate the super attribute from init_cumulative_args
>> to the branch pattern in md file to generate bk instruction.
>> I have intialized a global variable to 1 if super is detected and
>> checking the same in my pattern.
>>  My branch pattern looks like below
>> (define_insn "call_int1"
>>   [(call (mem (match_operand:SI 0 "call_insn_simple_operand" "ri"))
>>  (match_operand:SI 1 "" "i"))
>>   (clobber (reg:SI R_RS))]
>>  ""
>>   {
>> register rtx t = operands[0];
>> register rtx t2 = gen_rtx_REG (Pmode,
>>   GP_REG_FIRST + RETURN_ADDR_REGNUM);
>> if (GET_CODE (t) == SYMBOL_REF) {
>> if(super_var()) ---> Here I am
>> checking for global variable
>> {
>> return "bk\tr1,8\;%#";
>> }
>> else {
>> gen_rtx_CLOBBER (VOIDmode, t2);
>> return "br\tr1,%0\;%#";
>>
>> I observed that init_cumulative_args is called first for all the
>> functions once they are done then the above pattern for all the
>> instructions are called so my global variable is not useful.
>>
>> Can you help me how to exactly emit bk instruction from the pattern
>> when super function is called.
>
>
> Again I just have to say: look at the rs6000 port.  Look at the rs6000
> call instruction.  Look at how it decides whether to do a longcall or
> not.
>
> Ian


Re: function attributes

2013-10-16 Thread Umesh Kalappa
Here you go ,

a)define  new field in the struct "CUMULATIVE_ARGS" says as int long_call;

b)set the field long_call to known vlaue@ init_cumulative_args() .

c)In the TARGET_FUNCTION_ARG hook
The last time this MACRO is called, it is called with
MODE == VOIDmode, and its result is passed to the call or call_value
pattern as operands 2 and 3 respectively.
if(VOIDmode == MODE)
 return  INTVAL(CUMULATIVE_ARGS->long_call);
d)Handle operands[2] for call  pattern  as

(define_insn "call_name"
[(call (mem:SI (match_operand:MODE 0 "" ""))
  (match_operand 1 "" ""))
(use (match_operand:SI 2 "immediate_operand" ""))
 "
{
if (INTVAL (operands[2]) & long_call)
{
return "branch long"
}
else
{
return "branch other"
}
}
")
e)same for call_value ,where you ended to check the operands[3]


Hope this helps you there

Thanks
~Umesh


On Wed, Oct 16, 2013 at 2:26 PM, Nagaraju Mekala  wrote:
> Yes.. I still had no luck.
> Do you have any thoughts on this??
>
> On Wed, Oct 16, 2013 at 2:05 PM, Umesh Kalappa  
> wrote:
>> You still stuck with this issue ???
>>
>> ~Umesh
>>
>> On Tue, Oct 15, 2013 at 9:08 PM, Ian Lance Taylor  wrote:
>>> On Tue, Oct 15, 2013 at 8:04 AM, Nagaraju Mekala  
>>> wrote:
>>>>  Hi Ian,
>>>>
>>>>   Thanks for the reply.
>>>>
>>>> On Fri, Oct 11, 2013 at 10:31 PM, Ian Lance Taylor  wrote:
>>>>> On Fri, Oct 11, 2013 at 9:20 AM, Nagaraju Mekala  
>>>>> wrote:
>>>>>>
>>>>>> I observed that in rs6000 port longcall is implemented by using
>>>>>> CALL_LONG define.
>>>>>> #define CALL_LONG 0x0008 /* always call indirect */
>>>>>> In the md file they are checking the operand with CALL_LONG
>>>>>> if (INTVAL (operands[3]) & CALL_LONG)
>>>>>> operands[1] = rs6000_longcall_ref (operands[1]);
>>>>>> In my port I dont have suchthing to compare. Can we somehow parse the
>>>>>> tree chain and check the attributes of the functions..
>>>>>
>>>>> Look at init_cumulative_args in rs6000.c to see how CALL_LONG is set
>>>>> based on the function attribute.
>>>>
>>>> I was able to get the function attribute from the init_cumulative_args
>>>> function.  I have used the fndecl tree to get the attribute details
>>>> but I have failed to stop generating br instruction. It should print
>>>> bk instruction.
>>>> I was unable to relate the super attribute from init_cumulative_args
>>>> to the branch pattern in md file to generate bk instruction.
>>>> I have intialized a global variable to 1 if super is detected and
>>>> checking the same in my pattern.
>>>>  My branch pattern looks like below
>>>> (define_insn "call_int1"
>>>>   [(call (mem (match_operand:SI 0 "call_insn_simple_operand" "ri"))
>>>>  (match_operand:SI 1 "" "i"))
>>>>   (clobber (reg:SI R_RS))]
>>>>  ""
>>>>   {
>>>> register rtx t = operands[0];
>>>> register rtx t2 = gen_rtx_REG (Pmode,
>>>>   GP_REG_FIRST + RETURN_ADDR_REGNUM);
>>>> if (GET_CODE (t) == SYMBOL_REF) {
>>>> if(super_var()) ---> Here I am
>>>> checking for global variable
>>>> {
>>>> return "bk\tr1,8\;%#";
>>>> }
>>>> else {
>>>> gen_rtx_CLOBBER (VOIDmode, t2);
>>>> return "br\tr1,%0\;%#";
>>>>
>>>> I observed that init_cumulative_args is called first for all the
>>>> functions once they are done then the above pattern for all the
>>>> instructions are called so my global variable is not useful.
>>>>
>>>> Can you help me how to exactly emit bk instruction from the pattern
>>>> when super function is called.
>>>
>>>
>>> Again I just have to say: look at the rs6000 port.  Look at the rs6000
>>> call instruction.  Look at how it decides whether to do a longcall or
>>> not.
>>>
>>> Ian


Make SImode as default mode for INT type.

2013-12-06 Thread Umesh Kalappa
Hi all,

We are re-targeting the gcc 4.8.1 to the 16 bit core ,where word =int
= short = pointer= 16 , char = 8 bit  and long  =32 bit.

We model the above requirement as

#define BITS_PER_UNIT   8

#define BITS_PER_WORD   16

#define UNITS_PER_WORD  2

#define POINTER_SIZE16

#define SHORT_TYPE_SIZE 16

#define INT_TYPE_SIZE   16

#define LONG_TYPE_SIZE  32

#define FLOAT_TYPE_SIZE 16

#define DOUBLE_TYPE_SIZE32

Tried to compile the below sample by retargeted compiler

int a =10;

int b =10;


int func()

{

 return a+ b;

}

the compiler is stating that the a and b are global with short type(HI
mode) of size 2 bytes.

where as we  need the word mode as SI not HI ,I do understand that the
SI and HI modes are of  same size but till I insist  better to have SI
mode.

Please somebody or expert in the  group  share their thought on the
same  like how do we can achieve this ?

Thanks
~Umesh


[Warning] Signed mistach for basic datatype.

2013-12-06 Thread Umesh Kalappa
Hi All ,

The below sample caught my attention i.e

int a ;
unsigned int  b;
int func()
{
return a =b;
}
the compiler didn't warn  me about the signed mismatch in the above case.
where as

int *a ;
unsigned int  *b;
int func()
{
a =b;
return *a;
}
compiler warns me as

warning: pointer targets in assignment differ in signedness [-Wpointer-sign]

I’m bit confused or i'm missing something here .

any thoughts ??

Thanks
~Umesh


Unoptimal code.

2013-12-10 Thread Umesh Kalappa
Hi All,

Below is the patterns defined  for the  mov and add  instruction
.
[(set (match_operand:HI 0 "general_mov_operand" "=r,rRA")
(match_operand:HI 1 "general_mov_operand" "rRAi,ri"))]
  ""
 {

}
)

(define_insn "addhi3"
  [(set (match_operand:HI 0 "register_operand" "=Ar")
(plus:HI (match_operand:HI 1 "register_operand" "%0")
 (match_operand:HI 2 "general_mov_operand" "Ar")))]
  ""
  "add\t%0, (%2)"
)

The problem we stuck with is that the compiler emit unoptimal code for
the below testcase with -O0 option

int a,b;

int func()
{
   return a=b;
}

.s file

 ld  BC, (a)
 ld  WA, (b)
 add WA, BC
 ld  (a), WA
 ret

the compiler try to load a and b to the register BC and WA
respectively in the expand_assignment and add them , then store back
the result to a.

But  if you see the addhi3 definition ,it states that i'm allowed to
emit instruction like

add WA,(a)

where second operand can be register indirect addressing .

I can write peephole pattern to optimize  the emitted code like

.s file

 ld  WA, (b)
 add WA, (a)
 ld  (a), WA
 ret


the reason for the unoptimal  code is that  the code is expanded to
load  the memory contents  to the registers  and then update the add
operands accordingly. I don't want this to happen .

I will be glad ,if somebody from the group  share their experience or
through some insights  how i can achieve this .

Thanks
~Umesh


Invalid code emitted

2014-01-21 Thread Umesh Kalappa
Hi All ,

The following C  code snippet

unsigned char c ;
int d ;

int test ()
{
 d = c;
return d;
}

below is the RTL without optimisation enabled

(insn 6 5 0 (set (reg:QI 18 [ c.0 ])

(mem/c:QI (symbol_ref:HI ("c")  ) [0
c+0 S1 A8])) cnv.c:5 -1

 (nil))


(insn 7 6 8 (set (reg:QI 19)

(const_int 0 [0])) cnv.c:5 -1

 (nil))



(insn 8 7 0 (set (reg:HI 19 [ d.1 ])

(subreg:HI (reg:QI 18 [ c.0 ]) 0)) cnv.c:5 -1

 (expr_list:REG_EQUAL (zero_extend:HI (reg:QI 18 [ c.0 ]))

(nil)))



(insn 9 8 0 (set (mem/c:HI (symbol_ref:HI ("d")  ) [0 d+0 S2 A16])

(reg:HI 19 [ d.1 ])) cnv.c:5 -1

 (nil))



(insn 10 9 0 (set (reg:HI 20 [ D.1323 ])

(mem/c:HI (symbol_ref:HI ("d")  ) [0
d+0 S2 A16])) cnv.c:6 -1

 (nil))

and  respective ASM

ld  C, (c)
ld  B, 0
ld  (d), BC
ld  WA, (d)
ret


But problem arises when i enabled the optimisation -O3 ( with -da)  .

RTL is expanded  as show above, but after subreg pass(*.subreg)   we
see the below RTL


(note 4 0 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 3 4 8 2 NOTE_INSN_FUNCTION_BEG)

(insn 8 3 9 2 (set (mem/c:HI (symbol_ref:HI ("d")  ) [2 d+0 S2 A16])

(reg:HI 19 [ d.1 ])) cnv.c:5 9 {*movhi}

 (nil))

(insn 9 8 13 2 (set (reg:HI 20 [  ])

(reg:HI 19 [ d.1 ])) cnv.c:6 9 {*movhi}

 (nil))

(insn 13 9 16 2 (set (reg/i:HI 0 W)

(reg:HI 20 [  ])) cnv.c:7 9 {*movhi}

 (nil))

(insn 16 13 0 2 (use (reg/i:HI 0 W)) cnv.c:7 -1

 (nil))

;;  su

and respective asm emited as


ld  C, (c)
ld  (d), BC ;This is invalid  since B can have a clobbered value
ld  WA, (d)
ret

Would like know there exist any target hook to surpass the above
optimization so that i ended up emitting valid instructions.

Any lights on the above problem is appreciated .

Thanks in advance
~Umesh


Re: Invalid code emitted

2014-01-21 Thread Umesh Kalappa
My bad Ian,

Thanks for the input and the target was private and gcc 4.8.1 version
used and your are on same page on reg pairing .

Let me have a look on the port .

Thanks Again
~Umesh


On Tue, Jan 21, 2014 at 8:12 PM, Ian Lance Taylor  wrote:
> On Tue, Jan 21, 2014 at 12:52 AM, Umesh Kalappa
>  wrote:
>>
>> The following C  code snippet
>>
>> unsigned char c ;
>> int d ;
>>
>> int test ()
>> {
>>  d = c;
>> return d;
>> }
>>
>> below is the RTL without optimisation enabled
>>
>> (insn 6 5 0 (set (reg:QI 18 [ c.0 ])
>>
>> (mem/c:QI (symbol_ref:HI ("c")  ) [0
>> c+0 S1 A8])) cnv.c:5 -1
>>
>>  (nil))
>>
>>
>> (insn 7 6 8 (set (reg:QI 19)
>>
>> (const_int 0 [0])) cnv.c:5 -1
>>
>>  (nil))
>>
>>
>>
>> (insn 8 7 0 (set (reg:HI 19 [ d.1 ])
>>
>> (subreg:HI (reg:QI 18 [ c.0 ]) 0)) cnv.c:5 -1
>>
>>  (expr_list:REG_EQUAL (zero_extend:HI (reg:QI 18 [ c.0 ]))
>>
>> (nil)))
>>
>>
>>
>> (insn 9 8 0 (set (mem/c:HI (symbol_ref:HI ("d")  > d>) [0 d+0 S2 A16])
>>
>> (reg:HI 19 [ d.1 ])) cnv.c:5 -1
>>
>>  (nil))
>>
>>
>>
>> (insn 10 9 0 (set (reg:HI 20 [ D.1323 ])
>>
>> (mem/c:HI (symbol_ref:HI ("d")  ) [0
>> d+0 S2 A16])) cnv.c:6 -1
>>
>>  (nil))
>>
>> and  respective ASM
>>
>> ld  C, (c)
>> ld  B, 0
>> ld  (d), BC
>> ld  WA, (d)
>> ret
>>
>>
>> But problem arises when i enabled the optimisation -O3 ( with -da)  .
>>
>> RTL is expanded  as show above, but after subreg pass(*.subreg)   we
>> see the below RTL
>>
>>
>> (note 4 0 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>
>> (note 3 4 8 2 NOTE_INSN_FUNCTION_BEG)
>>
>> (insn 8 3 9 2 (set (mem/c:HI (symbol_ref:HI ("d")  > 0xb7356114 d>) [2 d+0 S2 A16])
>>
>> (reg:HI 19 [ d.1 ])) cnv.c:5 9 {*movhi}
>>
>>  (nil))
>>
>> (insn 9 8 13 2 (set (reg:HI 20 [  ])
>>
>> (reg:HI 19 [ d.1 ])) cnv.c:6 9 {*movhi}
>>
>>  (nil))
>>
>> (insn 13 9 16 2 (set (reg/i:HI 0 W)
>>
>> (reg:HI 20 [  ])) cnv.c:7 9 {*movhi}
>>
>>  (nil))
>>
>> (insn 16 13 0 2 (use (reg/i:HI 0 W)) cnv.c:7 -1
>>
>>  (nil))
>>
>> ;;  su
>>
>> and respective asm emited as
>>
>>
>> ld  C, (c)
>> ld  (d), BC ;This is invalid  since B can have a clobbered value
>> ld  WA, (d)
>> ret
>>
>> Would like know there exist any target hook to surpass the above
>> optimization so that i ended up emitting valid instructions.
>>
>> Any lights on the above problem is appreciated .
>
>
> You didn't say what target you are using, you didn't say what version
> of GCC you are using, and you didn't give enough information to
> understand what is happening.  I don't know what the "ld (d),BC"
> syntax means, but perhaps it means that two different registers, B and
> C, are combined to form a single value.  If that is true, then it
> appears that the subreg pass has dropped the insn setting
> pseudo-register 19.  I have no idea how that could happen; it suggests
> a bug in your backend port when handling a movqi of a subreg.
>
> Ian


Enable debug info

2014-01-29 Thread Umesh Kalappa
Dear All,

We need to support  the debug info emit for our private port on gcc 4.8.1.

I was in impression using option -g in the commandline by defualt
,will emit  the dwarf debugging symbols and the info ,But i was wrong
here.

Anyone in the group point me some references or through some lights on
how do i enable debug options  in the compiler.

Appreciate your comments and Thank you
~Umesh


type promotion

2014-01-29 Thread Umesh Kalappa
Hi All,

Was porting gcc 4.8.1 to the private target which has 8 bit regs  and
can be used as pair for 16bit like  AB ,CD but not BC or AD.

I was stuck in the type promotion like

int i;
unsigned char c;

int test ()
{
  i =c;
}

defined  the zero_extendqihi2 pattern for the above c construct  like

(define_expand zero_extendqihi2
[(set (operand:hi 0 "" """)
 (zero_extend:hi (operand:qi 1)))]
""
if(!reload_completed)
{
if(operands[1] != REG)
operands[1]= force_reg(QI,operands[1]);

/* Here i need to enforce gcc to use the next consective paired reg
like B  if operands[1] is in  A reg or D if  operands[1] is in  C */
}
 )

How do i module the above reguirement in the backend  ?


Thank you
~Umesh


Reg Alloc Problem.

2014-03-12 Thread Umesh Kalappa
Hi All,

We are porting the gcc 4.8.1 to the new target and which has the pair
16 bit registers  like AB or CD or EF   and we modeled  it in
reg_class as AB,CD and DE 16 bit pair_regs and CD ,EF as 16 bit
base_regs and A,B,C,D E  and F as 8 bit as general_regs.

We are stuck with below issues like

1)How do we modelled such that the register alloc to pick the
respective  base_regs i.e CD,DE  instead of AB as show in the below
case

LD AB ,_a;//invalid instead of it should be emit LD CD ,_a

LD (AB),#100;  // invalid instead of it should be emit LD (CD),#100


Please note  that we override  the target hook like REGNO_REG_CLASS
,but still no luck here .

 2)Current target enforce the restrictions on  the pair register set
usage for multiplication  like

MUL A,B  or MUL C,D or  MUL E,F

But not MUL A,C or MUL  B,C  etc not across the pair_regs .


Anyone can  please shed some lights here ,will be appreciate  and help
us in the great way .

 Thank you for the patience

~Umesh


New .rodata section.

2014-03-12 Thread Umesh Kalappa
Hi All ,

We are porting gcc4.8.1 to the new target and we created the new
.rodata section w.r.t flags by get_unnamed_section()  .

Now we need to associate the global  %object data of type  .word or
.byte  to the  created .rodata section and also we need to emit the
.rodata section  in the asm file .

Anyone can share their experice or some inputs or reference for the
same will be appreciated.

 Thank  you  for the patience.
~Umesh


Re: Reg Alloc Problem.

2014-03-14 Thread Umesh Kalappa
Hi All,

To handle the below problem i.e making specific set of register as
base registers ,which is the subset of general registers set.

we see the *.c.208.ira logs as

Pass 0 for finding pseudo/allocno costs


r21: preferred BASE_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS

a2 (r21,l0) best BASE_REGS, allocno GENERAL_REGS

r19: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS

a0 (r19,l0) best GENERAL_REGS, allocno GENERAL_REGS

r18: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS

a1 (r18,l0) best GENERAL_REGS, allocno GENERAL_REGS



  a0(r19,l0) costs: LOW_8BIT_REGS:0 BASE_REGS:0 GENERAL_REGS:0 ALL_REGS:0 MEM:8

  a1(r18,l0) costs: LOW_8BIT_REGS:0 BASE_REGS:0 GENERAL_REGS:0 ALL_REGS:0 MEM:8

  a2(r21,l0) costs: LOW_8BIT_REGS:2 BASE_REGS:0 GENERAL_REGS:4 ALL_REGS:4 MEM:8


where IRA choose the GENERAL_REG over BASE (preferred) for the r21
pseudo,i'm looking for the cause in our backend,but mean while anyone
in the group can share there experience w.r.t that will help me to
solve issue asap.

Thank you
~Umesh


On Wed, Mar 12, 2014 at 7:30 PM, Umesh Kalappa  wrote:
> Hi All,
>
> We are porting the gcc 4.8.1 to the new target and which has the pair
> 16 bit registers  like AB or CD or EF   and we modeled  it in
> reg_class as AB,CD and DE 16 bit pair_regs and CD ,EF as 16 bit
> base_regs and A,B,C,D E  and F as 8 bit as general_regs.
>
> We are stuck with below issues like
>
> 1)How do we modelled such that the register alloc to pick the
> respective  base_regs i.e CD,DE  instead of AB as show in the below
> case
>
> LD AB ,_a;//invalid instead of it should be emit LD CD ,_a
>
> LD (AB),#100;  // invalid instead of it should be emit LD (CD),#100
>
>
> Please note  that we override  the target hook like REGNO_REG_CLASS
> ,but still no luck here .
>
>  2)Current target enforce the restrictions on  the pair register set
> usage for multiplication  like
>
> MUL A,B  or MUL C,D or  MUL E,F
>
> But not MUL A,C or MUL  B,C  etc not across the pair_regs .
>
>
> Anyone can  please shed some lights here ,will be appreciate  and help
> us in the great way .
>
>  Thank you for the patience
>
> ~Umesh


RTL Optimisations

2014-03-25 Thread Umesh Kalappa
Dear  All,

The GCC source reference 4.8.1 will synthesized some of the double
word operations(SI mode) like add /sub in the below case from the word
size (HI) patterns,

(code snippet)
expand_binop_directly function in the optabs.c.

/* These can be done a word at a time by propagating carries. */
1949 if ((binoptab == add_optab || binoptab == sub_optab)
1950 && mclass == MODE_INT
1951 && GET_MODE_SIZE (mode) >= 2 * UNITS_PER_WORD
1952 && optab_handler (binoptab, word_mode) != CODE_FOR_nothing)
(code snippet)

Current private target port will hash the above conditions ,hence
compiler  will synthesizes the double word operation w.r.t word
operations .

We would like prevent this optimisations ,The reason for the same is
we do have the  some optimised intrinsic functions ,which is coded in
assemble and we want compiler to emit the intrinsic call  to these
routines instead of synthesizes the double word operations.

Anyone in the group can shed some lights here will be appreciated .

Thank you
~Umesh


Re: RTL Optimisations

2014-03-26 Thread Umesh Kalappa
Georg,


Currently we  implemented the expander ,where passing  arguments and
handling return type had been taken care in the expander along the
emitting call insn.

Do  you have any suggestion over here like other practical approach ?

Appreciate your reply here.

Thank you in advance
~Umesh

On Wed, Mar 26, 2014 at 7:42 PM, Georg-Johann Lay  wrote:
> Am 03/25/2014 01:28 PM, schrieb Jeff Law:
>
>> On 03/25/14 06:23, Umesh Kalappa wrote:
>>>
>>> Dear  All,
>>>
>>> The GCC source reference 4.8.1 will synthesized some of the double
>>> word operations(SI mode) like add /sub in the below case from the word
>>> size (HI) patterns,
>>>
>>> (code snippet)
>>> expand_binop_directly function in the optabs.c.
>>>
>>> /* These can be done a word at a time by propagating carries. */
>>> 1949 if ((binoptab == add_optab || binoptab == sub_optab)
>>> 1950 && mclass == MODE_INT
>>> 1951 && GET_MODE_SIZE (mode) >= 2 * UNITS_PER_WORD
>>> 1952 && optab_handler (binoptab, word_mode) != CODE_FOR_nothing)
>>> (code snippet)
>>>
>>> Current private target port will hash the above conditions ,hence
>>> compiler  will synthesizes the double word operation w.r.t word
>>> operations .
>>>
>>> We would like prevent this optimisations ,The reason for the same is
>>> we do have the  some optimised intrinsic functions ,which is coded in
>>> assemble and we want compiler to emit the intrinsic call  to these
>>> routines instead of synthesizes the double word operations.
>>>
>>> Anyone in the group can shed some lights here will be appreciated .
>>
>> Write an expander/pattern which calls your intrinsics.
>
>
> I don't think this is a good and very practical approach.
>
> Presumably it's just a flaw in the target RTX cost model.
>
> Johann
>


-fleading-underscore is not working as expected.

2014-04-02 Thread Umesh Kalappa
Dear All ,

Was enabled the switch  "-fleading-underscore"  to emit the global
symbol name with prefix _ .

The respective C source file

int a=10;

int b=10,c;

int test()

{

c =a+b ;

tes();

return c ;

}

and respective asm file

.global _a

.section.data

.align  1

.type   _a, %object

.size   _a, 2

_a:

.word   10

.global _b

.align  1

.type   _b, %object

.size   _b, 2

_b:

.word   10

.comm   _c, 2,2

.section.text

.align  1

.global _test

.type   _test, %function

_test:

ld  HL, (a)

ld  DE, (b)

add DE, HL

ld  (c), DE

cal _tes

ld  DE, (c)

ld  WA, DE

ret


if you see the asm ,the global symbol names was prefixed with _ in the
definition ,But not in the uses.

I'm sure we are missing something here w.r.t -fleading-underscore flag
and gcc source is 4.8.1.

Any help will be appreciated here .

Thank you
~Umesh


Code emitted was bloated with no optimisation.

2014-04-10 Thread Umesh Kalappa
Hi there,

we ported gcc 4.8.1 to our ptivate target and the code is bloated for
the array access as shown below

C file :
int a[10];
int i;
test()
{
a[9] = 10;
a[i] = 20;
}

xgcc -O2 -S test.c

_test:
ld  (_a+18), 10 ;a[9] = 10;

ld  WA, (_i) ; a[i] = 20;
add WA, WA
add WA, _a
ld  HL, WA
ld  (HL), 20

ret
.comm   _i, 2,2
.comm   _a, 20,20


The above generated code looks  better when compare to below generated
code with no optimisations

xgcc -S test.c

.comm   _a, 20,20
.comm   _i, 2,2

.type   _test, %function
_test:
sub SP, 4
ld  WA, 10
ld  (_a+18), WA  ; a[9] = 10;

ld  WA, (_i) ;code bloated here for a[i]
ld  IX, WA
ld  BC, 15
cal _C87C_shris
ld  IY, WA
ld  DE, WA
ld  HL, BC
ld  WA, IX
add WA, IX
ld  DE, WA
ld  WA, DE
ld  BC, HL
ld  HL, 1
ld  (SP+2), HL
cmp WA,IX
j   lt,_.L2
ld  DE, 0
ld  (SP+2), DE
.L2:
ld  DE, WA
ld  HL, BC
ld  BC, IY
add BC, IY
ld  HL, BC
ld  WA, DE
ld  BC, HL
ld  DE, (SP+2)
add DE, BC
ld  BC, DE
ld  HL, WA
ld  (SP+0), HL
ld  WA, 20
ld  HL, (SP+0)
ld  (a+i), WA
add SP, 4
ret

when you access the array with the constant index i.e a[9] the
generated code was better.

but could not track why the code is bloated for the a[i] access.

Please somebody from the group can share their thoughts and will be
appricate the same.

Thank you
~Umesh


Re: Code emitted was bloated with no optimisation.

2014-04-11 Thread Umesh Kalappa
Hi Andrew,

Appreciate your  reply here and yes unoptimized code is expected to be
large ,but for the construct like a[i] the below generated code looks
crazy to me .

 ld  WA, 10
ld  (_a+18), WA  ; a[9] = 10;

ld  WA, (_i) ;code bloated here for a[i]
ld  IX, WA
ld  BC, 15
cal _C87C_shris
ld  IY, WA
ld  DE, WA
ld  HL, BC
ld  WA, IX
add WA, IX
ld  DE, WA
ld  WA, DE
ld  BC, HL
ld  HL, 1
ld  (SP+2), HL
cmp WA,IX
j   lt,_.L2
ld  DE, 0
ld  (SP+2), DE
.L2:
ld  DE, WA
ld  HL, BC
ld  BC, IY
add BC, IY
ld  HL, BC
ld  WA, DE
ld  BC, HL
ld  DE, (SP+2)
add DE, BC
ld  BC, DE
ld  HL, WA
ld  (SP+0), HL
ld  WA, 20
ld  HL, (SP+0)
ld  (a+i), WA

Anyidea why it so ??

Thank you in advance
~Umesh

On Thu, Apr 10, 2014 at 8:54 PM, Andrew Haley  wrote:
> On 04/10/2014 04:12 PM, Umesh Kalappa wrote:
>
>> Please somebody from the group can share their thoughts and will be
>> appricate the same.
>
> But unoptimized code is expected to be large.  Why do you expect
> otherwise?
>
> Andrew.
>


Re: Code emitted was bloated with no optimisation.

2014-04-11 Thread Umesh Kalappa
Richard ,
Pmode is defined  HImode and private target is 16 bit where int ,short
and Pmode  is defined  HImode and long as SImode.

Please do let me know if it requires more information on the target.

Thank you
~Umesh

On Fri, Apr 11, 2014 at 4:35 PM, Richard Sandiford
 wrote:
> Andrew Haley  writes:
>> On 04/10/2014 04:12 PM, Umesh Kalappa wrote:
>>
>>> Please somebody from the group can share their thoughts and will be
>>> appricate the same.
>>
>> But unoptimized code is expected to be large.  Why do you expect
>> otherwise?
>
> Sure, but this is a bit extreme.  I don't see off-hand how a[i]
> would generate a branch, for starters.
>
> But it's very hard to answer this kind of question for a private port.
> How is Pmode defined?  If it's a partial integer mode like PSImode,
> is the problem that the arithmetic is being done in SImode?
>
> Thanks,
> Richard
>


RTL insns set differences

2013-01-30 Thread Umesh Kalappa
Dear Group,
Need a favour from you all ,Im very new to gcc framework such and
learning the same ,
I was looking at the RTL insns sets by dumping the c.144.exapnd dump
file before reload pass for the various target and I do see the
difference in the RTL insns set for two different targets for below
sample code

test.c

int func ()

{ int r =10;

int d =r ;

return r+d;

}

RTL Insns for Target -1(test.c.144.expand)

;; r_1 = 10;

(insn 5 4 6 (set (reg:SI 136)
(const_int 10 [0xa])) test7.c:3 -1
(nil))

(insn 6 5 0 (set (mem/c/i:SI (plus:SI (reg/f:SI 129 virtual-stack-vars)
(const_int -4 [0xfffc])) [0 r+0 S4 A32])
(reg:SI 136)) test7.c:3 -1
(nil))

;; d_2 = r_1;

(insn 7 6 8 (set (reg:SI 137)
(mem/c/i:SI (plus:SI (reg/f:SI 129 virtual-stack-vars)
(const_int -4 [0xfffc])) [0 r+0 S4 A32])) test7.c:4 -1
(nil))

(insn 8 7 0 (set (mem/c/i:SI (plus:SI (reg/f:SI 129 virtual-stack-vars)
(const_int -8 [0xfff8])) [0 d+0 S4 A32])
(reg:SI 137)) test7.c:4 -1
(nil))

;; D.1269_3 = r_1 + d_2;

(insn 9 8 10 (set (reg:SI 138)
(mem/c/i:SI (plus:SI (reg/f:SI 129 virtual-stack-vars)
(const_int -4 [0xfffc])) [0 r+0 S4 A32])) test7.c:5 -1
(nil))

(insn 10 9 11 (set (reg:SI 139)
(mem/c/i:SI (plus:SI (reg/f:SI 129 virtual-stack-vars)
(const_int -8 [0xfff8])) [0 d+0 S4 A32])) test7.c:5 -1
(nil))

(insn 11 10 0 (set (reg:SI 134 [ D.1269 ])
(plus:SI (reg:SI 138)
(reg:SI 139))) test7.c:5 -1
(nil))

;; return D.1269_3;

(insn 12 11 13 (set (reg:SI 135 [  ])
(reg:SI 134 [ D.1269 ])) test7.c:5 -1
(nil))

(jump_insn 13 12 14 (set (pc)
(label_ref 0)) test7.c:5 -1
(nil))


RTL insns for Target-2(test.c.144.expand)


;; r_1 = 20;

(insn 5 4 0 (set (mem/c/i:SI (reg/f:SI 17 virtual-stack-vars) [0 r+0 S4 A32])
(const_int 20 [0x14])) test.c:3 -1
(nil))

;; d_2 = r_1;

(insn 6 5 0 (set (mem/c/i:SI (plus:SI (reg/f:SI 17 virtual-stack-vars)
(const_int 4 [0x4])) [0 d+0 S4 A32])
(mem/c/i:SI (reg/f:SI 17 virtual-stack-vars) [0 r+0 S4 A32])) test.c:4 -1
(nil))

;; D.1199_3 = r_1 + d_2;

(insn 7 6 8 (set (reg:SI 24)
(mem/c/i:SI (reg/f:SI 17 virtual-stack-vars) [0 r+0 S4 A32])) test.c:5 -1
(nil))

(insn 8 7 9 (set (reg:SI 25)
(mem/c/i:SI (plus:SI (reg/f:SI 17 virtual-stack-vars)
(const_int 4 [0x4])) [0 d+0 S4 A32])) test.c:5 -1
(nil))

(insn 9 8 0 (set (reg:SI 22 [ D.1199 ])
(plus:SI (reg:SI 24)
(reg:SI 25))) test.c:5 -1
(nil))

;; return D.1199_3;

(insn 10 9 11 (set (reg:SI 23 [  ])
(reg:SI 22 [ D.1199 ])) test.c:5 -1
(nil))

(jump_insn 11 10 12 (set (pc)
(label_ref 0)) test.c:5 -1
(nil))

Have a some queries to the group ,like

First ,As per the gcc Gimple to RTL conversion ,the RTL insns set
should be same for the both target ...am i rite here ???...or do i
miss something here ???

Second,If @ first i'm wrong here ..i need to emit the RTL insns for
the target-2 as similiar to target-1..Please some one from the group
can guide me here to so same 

Thanks in Advance

~Umesh


Passing the complex args in the GPR's

2023-06-06 Thread Umesh Kalappa via Gcc
Hi all ,

For the test case https://godbolt.org/z/vjs1vfs5W ,we see the mismatch
in the ABI b/w gcc and clang .

Do we have any supporting documents that second the GCC behaviour over CLANG ?

EABI states like

In the Power Architecture 64-Bit ELF V2 ABI Specification document
(v1.1 from 16 July 2015)

Page 53:

Map complex floating-point and complex integer types as if the
argument was specified as separate real
and imaginary parts.

and in this case the double complexes are broken down with double real
and double img and expected to pass in FPR not the GPR.



Thank you
~Umesh


Re: Passing the complex args in the GPR's

2023-06-06 Thread Umesh Kalappa via Gcc
Hi Adnrew,
Thank you for the quick response and for PPC64 too ,we do have
mismatches in ABI b/w complex operations like
https://godbolt.org/z/bjsYovx4c .

Any reason why GCC chose to use GPR 's here ?

~Umesh

On Tue, Jun 6, 2023 at 8:28 PM Andrew Pinski  wrote:
>
> On Tue, Jun 6, 2023 at 7:50 AM Umesh Kalappa via Libc-alpha
>  wrote:
> >
> > Hi all ,
> >
> > For the test case https://godbolt.org/z/vjs1vfs5W ,we see the mismatch
> > in the ABI b/w gcc and clang .
> >
> > Do we have any supporting documents that second the GCC behaviour over 
> > CLANG ?
> >
> > EABI states like
> >
> > In the Power Architecture 64-Bit ELF V2 ABI Specification document
> > (v1.1 from 16 July 2015)
>
> You are looking at the wrong ABI document.
> That is for the 64bit ABI.
> The 32bit ABI document is located at:
> http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf
>
> Plus the 32bit ABI document does not document Complex argument passing
> as it was written in 1995 and never updated.
>
> https://www.nxp.com/docs/en/reference-manual/E500ABIUG.pdf does not
> document it either.
>
> Thanks,
> Andrew Pinski
>
> >
> > Page 53:
> >
> > Map complex floating-point and complex integer types as if the
> > argument was specified as separate real
> > and imaginary parts.
> >
> > and in this case the double complexes are broken down with double real
> > and double img and expected to pass in FPR not the GPR.
> >
> >
> >
> > Thank you
> > ~Umesh


Re: Passing the complex args in the GPR's

2023-06-06 Thread Umesh Kalappa via Gcc
Hi Segher ,

>>What did you expect, what happened instead?
For example the complex args are passed in GPR's for  cexp in the case
GCC and Clang uses  caller memory .

for reference : https://godbolt.org/z/MfMz3cTe7

We have cross tools  like some of libraries built  using  the GCC and
some use Clang .

We approached Clang developers on this behaviour (Why stack , not the
FPR's registers like PPC64)  and they are not going to change this
behaviour, and asked us to refer back to GCC ,hence this email thread.

Question is : Why does GCC choose to use GPR's here and have any
reference to support this decision  ?

Thank you
~Umesh



On Tue, Jun 6, 2023 at 10:16 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Tue, Jun 06, 2023 at 08:35:22PM +0530, Umesh Kalappa wrote:
> > Hi Adnrew,
> > Thank you for the quick response and for PPC64 too ,we do have
> > mismatches in ABI b/w complex operations like
> > https://godbolt.org/z/bjsYovx4c .
> >
> > Any reason why GCC chose to use GPR 's here ?
>
> What did you expect, what happened instead?  Why did you expect that,
> and why then is it an error what did happen?
>
> You used -O0.  As long as the code works, all is fine.  But unoptimised
> code frequently is hard to read, please use -O2 instead?
>
> As Andrew says, why did you use -m32 for GCC but -m64 for LLVM?  It is
> hard to compare those at all!  32-bit PowerPC Linux ABI (based on 32-bit
> PowerPC ELF ABI from 1995, BE version) vs. 64-bit ELFv2 ABI from 2015
> (LE version).
>
>
> Segher