Looking at UNSUPPORTED dejagnu tests for a port...

2021-03-30 Thread Alan Lehotsky
I’m doing some final polishing on a gcc 8.3 upgrade and taking a look at the 
unsupported tests.   Most of them are completely sensible (my port doesn’t 
support trampolines, for example).  But gcc.c-torture/execute/pr78622.c is 
marked as unsupported.  That appears to be due to the line

   { dg-require-effective-target c99_runtime }

I’m using newlib, and if I manually compile the test case with or without an 
explicit —std=c99, it compiles and links without error.
Do I need to set something in the baseboards file or in a local .exp file to 
indicate that c99 is okay?



Turning off SRA

2021-01-21 Thread Alan Lehotsky
I’m working on performance tuning a gcc 8.3 port and wanted to turn off SRA for 
an experiment.  But passing both

 -fno-tree-sra
 -fno-ipa-sra

but it’s still tagging compiled functions with  a “_isra” suffix, which would 
seem to indicate that it’s still running that optimization.

Is there a bigger hammer I’m missing?


Alan Lehotsky
https://codegentllc.com





Trying to chase down a scheduler bug in gcc 4.4.1

2020-10-16 Thread Alan Lehotsky
I’m in the process of upgrading a gcc port, but my client is using a gcc 4.4.1 
port right now and has run into a scheduler bug.  This seems to have been fixed 
at some point, as the 8.3.1 code base doesn’t seem to have the bug.  But they’d 
like a fix on their 4.4.1 base.

Basically, what I see is a block of code where we have

struct pnode * pn = ctx->return_pn;

atomic_write_u32((unsigned int*)>return_pn, 0);

x = pn-> x;

where the ‘atomic_write_u32’ is an extended asm that is basically

static inline void atomic_write_u32( unsigned int *reg, unsigned int v) {
  asm ( "move %[dest], %[src]\n”
: [dest] “=m” (*reg)
: [src] “r” (v));
}

What happens is that the load of the local pn gets motioned AFTER the 
assignment of zero to the passed in reference of ctx->return_pn, and we SEGV at 
runtime dereferencing a NULL pointer.

I’ve checked the phase dumps and everything’s fine in the RTL until sched1, 
where we end up with

;;== 
;;   -- basic block 2 from 2 to 13 -- before reload
;;   ==

;;  0--> 2 r50=d0   
:i_pipeline
;;  1--> 7 r51=0x0  
:i_pipeline
;;  2--> 8 [r50+0x9c]=asm_operands  :nothing
;;  3--> 6 r47=[r50+0x9c]   
:i_pipeline<=== moved load of pn after code that zeroes it
;;  4--> 9 d0=r47   
:i_pipeline
;;  5-->10 call [`pnode_ref_dec']   :i_pipeline
;;  6-->12 {cc=cmp([r50+0xb0],0x0);r49=[r50+0   :i_pipeline
;;  7-->13 pc={(cc==0x0)?L23:pc}:i_pipeline
;; Ready list (final):  
;;   total time = 7
;;   new head = 2
;;   new tail = 13

I grovelled thru Bugzilla, couldn’t find anything that seemed relevant using 
search terms like “sched”, “haifa”, “asm”.  I’m hoping that someone might 
recognize this problem and point me to a relevant Bugzilla report before I dig 
into the schedule pass to try and see why it goes wrong.

I’m guessing that the pass is not recognizing the aliasing of >return_pn 
in the caller and *reg in the inline asm results in thinking it’s safe to 
motion a reference to ctx->return_pn...

Re: Hoisting DFmode loads out of loops..

2020-06-25 Thread Alan Lehotsky
On Jun 25, 2020, at 6:37 PM, Jeff Law mailto:l...@redhat.com>> 
wrote:

On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote:
I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit 
data-path between registers and memory;

looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the 
move of a double constant to memory into multiple moves (4 in fact, because I 
only have a 16-bit immediate mode.)

The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”.

Is there some other trick I need get the constant hoisted.  I have already set 
the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns)
Hi Alan, it's been a long time...

We'd probably need to set the RTL.  A variety of things can get in the way of
LICM.  For example, I'd expect subregs to be problematical because they can look
like RMW operations.

jeff



Hello to you too, Jeff….   I’ve been lurking for the last decade or so, last 
port I actually did was was GCC 4 based, so lots of new stuff to try and wrap 
my head around.  I certainly am grateful for anybody with suggestions as to how 
to track down this problem (I’m not terribly eager to do a
parallel stepping thru a x86 gcc in parallel with my port to see where they 
diverge in the loop-invariant recognition.)

Although in crafting this expanded email, I see that the x86 has already 
decided to store the constant 18.4242 in the .rodata section by the start of 
loop-invariance so there’s a

(set (reg:DF…. ) (mem:DF  (symbol_ref ….)))

and I bet that’s far easier to move out of the loop than it would be to split 
the original

(set (mem:DF…) (const_double:DF ….))

— Al

==

Source code is

void f (double *a)
{
int i;
for (i = 0; i < 100; i++_
a[i] = 18.4242;
}
==

Here’s the dump from loop-9.c.252r.loop2-invariant  (compiled -O1)


;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0, symbol_order=0)

*starting processing of loop 1 **
starting the processing of deferred insns
ending the processing of deferred insns
setting blocks to analyze 3, 5
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33)
df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33)
df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 (  0.5)


starting region dump


f

Dataflow summary:
def_info->table_size = 3, use_info->table_size = 23
;;  invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 
8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24 [acc0_hi] 25 
[acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc] 31 [int_set0] 32 
[int_set1] 33 [int_clr0] 34 [int_clr1] 35 [scratchpad0] 36 [scratchpad1] 37 
[scratchpad2] 38 [scratchpad3]
;;  hardware regs used 23 [sp] 29 [arg] 39 [sfp]
;;  regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
;;  eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
;;  entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 8 
[d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
;;  exit block uses 22 [a6] 23 [sp] 39 [sfp]
;;  regs ever live 0 [d0] 30 [cc]
;;  ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d} r6={1d} 
r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u} r29={1d,4u} 
r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u}
;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns.
;; Reaching defs:
;;  sparse invalidated
;;  dense invalidated 0, 1
;;  reg->defs[] map: 30[0,1] 46[2,2]
;; bb 3 artificial_defs: { }
;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }}
;; lr  in   22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
;; lr  use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
;; lr  def 30 [cc] 46
;; live  in   46
;; live  gen 30 [cc] 46
;; live  kill 30 [cc]
;; rd  in   (1) 46[2]
;; rd  gen (2) 30[1],46[2]
;; rd  kill (3) 30[0,1],46[2]
;;  UD chains for artificial uses at top

(code_label 11 7 8 3 2 (nil) [0 uses])
(note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
;;   UD chains for insn luid 0 uid 9
;;  reg 46 { d2(bb 3 insn 10) }
(insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15, offset: 
0B]+0 S8 A32])
(const_double:DF 1.842419990222931955941021442413330078125e+1 
[0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf}
 (nil))
;;   UD chains for insn luid 1 uid 10
;;  reg 46 { d2(bb 3 insn 10) }
(insn 10 9 12 3 (parallel [
(set (reg:SI 46 [ ivtmp___6 ])
(plus:SI (reg:SI 46 [ ivtmp___6 ])
(const_int 8 [0x8])))
(clobber (reg:CC 30 cc))
]) 81 {addsi3_1v5}
 (expr_list:REG_UNUSED (reg:CC 30 cc)
(nil)))
;;   UD chains for insn luid 2 uid 12
;;  reg 46 { d2(bb 3 insn 10) }
;;  reg 48 { }
(insn 12 10 13 3 (set (reg:CCWZ 30 cc)
(compare:CCWZ (reg:SI 46 [ ivtmp___6 ])
(reg:S

Hoisting DFmode loads out of loops..

2020-06-25 Thread Alan Lehotsky
I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit 
data-path between registers and memory;  

looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the 
move of a double constant to memory into multiple moves (4 in fact, because I 
only have a 16-bit immediate mode.)

The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”.

Is there some other trick I need get the constant hoisted.  I have already set 
the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns)


Alan Lehotsky
https://codegentllc.com





connecting a QEMU VM to dejagnu...

2019-10-16 Thread Alan Lehotsky
I’m trying to grapple with connecting dejagnu to a QEMU simulator; not finding 
any obvious examples to work with.

I’ve had a lot of familiarity using CGEN simulators connected to dejagnu, but 
QEMU’s a new breed of cat….

Can anyone point me to a boards/.exp that is based on using QEMU, or 
provide other examples.  

The one example I found via a web search seems to want to do everything in the 
virtual machine - but I have to believe that’s going to be insanely slow…




setting include paths for a cross compiler in gcc 3.4.6

2015-08-06 Thread Alan Lehotsky
I have a funny situation where I’m trying to build a cross compiler for x86 
hosted on x86 where I’d like to use the native headers and libraries. 

I tried defining INCLUDE_DEFAULTS, and that didn’t help.  The documentation 
says it’s ignored for cross compilers.

Any suggestions, or am I going to have to fool the configuration scripts into 
thinking this is a host=target configuration?



How to upgrade a tool-chain tree...

2014-12-28 Thread Alan Lehotsky
I have a tool chain for an experimental processor, built starting with release 
or snapshot distributions of

binutils-2.21
cgen-20110901
gcc-4.6.1
newlib-1.19.0
gdb 7.2

I'm using SVN for version control locally.

I'd like to upgrade it to a newer source base; but since it wasn't done by 
checking out from the FSF and Sourceware version-control systems, I'm wondering 
if there's any clever way to merge my tree or if I just have to bite the bullet 
and essentially take everything I've modified since version 1 in my source tree 
and import them into a new source tree?




Re: delay slot of conditionnal branch with no annuled jump strategy

2013-10-10 Thread Alan Lehotsky
I have a gcc 4.6.1 port that has the same sort of  problems.  I tried 
selectively porting some patches from later 4.6 releases, but they didn't seem 
to actually address the issue.  I haven't looked at the trunk to see if there 
are patches that are more apropos.

On Oct 10, 2013, at 12:33 PM, Jeff Law l...@redhat.com wrote:

 On 10/10/13 07:31, BELBACHIR Selim wrote:
 
 Why GCC doesn't see, in this case, that it's not safe to fill the delay slot 
 with my compare insn (which is a parallel RTX which clobber one register 
 used in fallthrough branch) ?
 Is a processor 'annuled jump strategy' mandatory to handle delay slot of 
 conditionnal jump instructions ?
 You'd need to debug reorg.  reorg has code to track resources to avoid these 
 kind of issues.  You'd have to debug why it's not working as expected.
 
 annulling is not required for proper functioning of the reorg pass, it's 
 merely an optimization.
 
 I'd start by first verifying your delay slot descriptions do not allow 
 nullifying the delay slot.
 
 Jeff



Confusion about delay slots and using condition-code register

2013-03-06 Thread Alan Lehotsky
I'm using the CCmode model for condition-code handling in a 4.6.1 based 
compiler.  Every other port I've done used the CC0 model, so I'm probably doing 
something misguided here.

I'm down to just 170 failures in the check-gcc testsuites, so it's looking 
pretty solid; of the failures about 30 are tests with delay-slots being filled 
incorrectly.

The situation I see is where we have source that looks like
 
if (x != 0)
 count++;
if (y != z)
  .

 RTL (without delay slot considerations looks like)

   jeq$1
  
   add  r1,1
$1:  cmp r2,r3
jeq  $2

branches have delay slots, and are not annullable.  When reorg runs,  it 
realizes that it can't put the add into the delay slot, but it hoists the cmp 
instruction into the first branch slot,  ala

 jeq $1
 cmp  r2,r3

 add r1,1
$1:   jeq   $r2
..

So, if the first branch is not taken, we set the condition codes needed for the 
second branch and clobber them with the add instruction then fall to the 
conditional branch using the wrong condition codes.


I emit (clobber (reg:CC CCreg))  with every instruction that can set condition 
codes, but it appears that nearly all of them are removed before we reach reorg 
where mark_referenced_resources() or mark_set_resources() would detect a 
conflict of the CCreg's.

So, am I constructing my RTL incorrectly?  Do I need to be making the clobbers 
inside a parallel instead of just emitting them sequentially?  Or should I just 
fall back to a cc0 model where this shouldn't be a problem?

The define_expand pattern for add looks like

(define_expand addS:mode3
  [(set (match_operand:S 0 nonimmediate_operand)
(plus:S (match_operand:S 1 general_operand)
(match_operand:S 2 general_operand)))
   (clobber (reg:CC CC_REGNUM))]
  
  .
  })

has corresponding define_insn's are


(define_insn *addsi
 [(set (match_operand:SI  0 nonimmediate_operand =rm,rm,rS,rm)
   (plus:SI (match_operand:SI 1 nonimmediate_operand  %0, 0, 0,rm)
(match_operand:SI 2 general_operand   QI, K, i,rm)))]
,
)

(define_insn *addsi_cc
 [(set (reg:CC CC_REGNUM)
   (compare:CC
  (plus:SI (match_operand:SI 1 nonimmediate_operand %0, 0,  0,rm)
   (match_operand:SI 2 general_operand  QI, K,  i,rm))
  (const_int 0)))
  (set (match_operand:SI 0 nonimmediate_operand =rm,rm,rS,rm)
   (plus:SI (match_dup 1)
(match_dup 2)))]
 



filling delay slots with branches

2013-03-05 Thread Alan Lehotsky
Am I correct in my understanding that you can't put a branch instruction in the 
delay slot of a branch instruction?

Semantically, the HW I'm looking at annuls the branch in the delay slot if the 
first branch is taken, but any other instructions are not annulled; but it 
appears that there's no way to describe this in the define_delay() and it looks 
to me like the delay-slot for the instruction in the delay slot won't get 
filled properly either.

e.g.
cmpi $r1,0
jeq  $1
   jlt  $2
 jmp  $3
   nop

would be a 3-way branch  on zero, neg or  (by elimination) positive values with 
 the indented instructions being
in a branch delay slot.






Can DWARF2 CFI represent a static return location?

2013-02-19 Thread Alan Lehotsky
I'm looking at a machine with limited stack, and no push instructions or 
displaced-addressing mode.  The call instruction stores the return address in 
the link register.

For non-recursive functions we save the return address in a static memory 
location, but I can't find a way to tell the DWARF2 CFI that the saved location 
is a static MEM rtx.  Am I missing something?




code hoisting with CCmode condition codes

2013-01-06 Thread Alan Lehotsky
I'm obvkously doing something stupid here; but I'm at a loss to figure it out.

Porting to a machine where most instructions set some condition codes and 
before hoisting, we have


(insn 1205 1204 1206 65 (set (reg:CC_ZN 24 *cc)
(compare:CC_ZN (reg:SI 843)
(reg:SI 844))) 
../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 83 {*cmpsi_zn}
 (expr_list:REG_EQUAL (compare:CC_ZN (reg:SI 843)
(const_int 0 [0]))
(expr_list:REG_DEAD (reg:SI 844)
(nil

(jump_insn 1206 1205 1207 65 (set (pc)
(if_then_else (eq (reg:CC_ZN 24 *cc)
(const_int 0 [0]))
(label_ref 1215)
(pc))) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 86 
{branch_insn}
 (expr_list:REG_DEAD (reg:CC_ZN 24 *cc)
(nil))
 - 1215)


But after the hoist pass runs, we've  inserted a   (set (reg:1884) (plus (reg 
674) (const_int 4)))
between the CC set and use.


(insn 1203 1202 1205 62 (clobber (reg:CC 24 *cc)) 
../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 -1
 (expr_list:REG_UNUSED (reg:CC 24 *cc)
(nil)))

(insn 1205 1203 6262 62 (set (reg:CC_ZN 24 *cc)
(compare:CC_ZN (reg:SI 843)
(const_int 0 [0]))) 
../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 83 {*cmpsi_zn}
 (expr_list:REG_EQUAL (compare:CC_ZN (reg:SI 843)
(const_int 0 [0]))
(expr_list:REG_DEAD (reg:SI 844)
(nil

(insn 6262 1205 1206 62 (set (reg/f:SI 1884 [ ap ])
(plus:SI (reg/v/f:SI 674 [ ap ])
(const_int 4 [0x4]))) 10 {*addsi}
 (nil))

(jump_insn 1206 6262 1207 62 (set (pc)
(if_then_else (eq (reg:CC_ZN 24 *cc)
(const_int 0 [0]))
(label_ref 1215)
(pc))) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 86 
{branch_insn}
 (expr_list:REG_DEAD (reg:CC_ZN 24 *cc)
(nil))
 - 1215)


and this is a problem because that will codegen as a 

move$rtemp,src
addi  $rtemp,4

and both of those instructions end up setting condition codes, breaking the 
dependency with the compare that actually sets the CCs..

I've looked at a number of the other CCmode ports, but can't see how they 
prevent this from occurring.

Do I need to do something like change my define_expand for addsi3 to be 
something like

 
[ (parallel [ (clobber:CCmode (reg:CCmode CCREG))
(set:SI (match_operand:SI 0  )
(plus:SI (match_operand::SI 1   )
   (match_operand:SI 2  
)))])]

So that there's an explicitly stated dependency that would prevent motioning 
before the use in the jump_insn?

But I don't see this construct used in other ports that have CCmode registers 
as opposed to cc0

I'm porting of 4.6.1 - is there any better documentation of CCmode ports or a 
reference port with 


Using a 'V' constraint with QI mode....

2012-12-29 Thread Alan Lehotsky
The V constraint is essentially implemented by checking that the addressing 
mode presented is NOT offsettable.  But that's done by adding 
GET_MODE_SIZE(mode) - 1.

I've got a machine that supports indirection but not offsetting or indexing.  
But the V constraint fails for any 

(mem:QI (reg:SI p0 ) )

Am I missing some trick that would allow me to make effective use of V?


Bad and/or stupid code for DImode compares with gcc 4.6.1

2012-10-31 Thread Alan Lehotsky
I'm looking at code generated for a new port of gcc using 4.6.1 and failing 
execute/950607-2.c with -O0 only 

The target chip has only 32 bit instructions, so it's using 
do_jump_by_parts_relop_rtx() to expand the compare.
I've set up my .md to use the CCmode. 

I see one case that seems really stupid, and one that's just wrong.  I'm 
thinking that either I have something really flawed with my port's handing of 
DImode or that there was a bug in 4.6.1.The port is only failing about 2100 
dejagnu tests (passing 64000+) and a good chunk of the failures are due to the 
ridiculously small data-memory size of the chip.

For

long long int x;

if ( x  0 ) return 0 else return 2;

I see code that compares MSBs and branches on  (less than) as expected.  But 
then it goes and checks the MSBs for != , and finally it checks the LSBS and 
emits a conditional branch to  the ELSE, followed by an unconditional branch to 
the ELSE, so that I end up with code that looks like

mov $r1,x
mov $r2,x+4
cmpi $r2,0
jlt   .L5
cmpi  $r2,0   === totally redundant for x  0 comparisons
jne .L2
cmpi $r1,0
jmp .L4

.L5 : movi $r1, 0
 jump .L4

.L2  : movi $r1, 2

.L4:
  ret


This is a simplification of 950607-2.c, which fails at -O0, but passes at 
higher optimization levels (go figure...)



Re: Bad and/or stupid code for DImode compares with gcc 4.6.1

2012-10-31 Thread Alan Lehotsky
So, I found the patch to do_jump_by_parts_greater_rtx() by Eric Botcazou that 
should address the stupid code and the redundant branch.

Should have done a broader search before I wasted email bandwidth...
On Oct 31, 2012, at 1:51 PM, Alan Lehotsky alehot...@me.com wrote:

 I'm looking at code generated for a new port of gcc using 4.6.1 and failing 
 execute/950607-2.c with -O0 only 
 
 The target chip has only 32 bit instructions, so it's using 
 do_jump_by_parts_relop_rtx() to expand the compare.
 I've set up my .md to use the CCmode. 
 
 I see one case that seems really stupid, and one that's just wrong.  I'm 
 thinking that either I have something really flawed with my port's handing of 
 DImode or that there was a bug in 4.6.1.The port is only failing about 
 2100 dejagnu tests (passing 64000+) and a good chunk of the failures are due 
 to the ridiculously small data-memory size of the chip.
 
 For
 
   long long int x;
 
if ( x  0 ) return 0 else return 2;
 
 I see code that compares MSBs and branches on  (less than) as expected.  But 
 then it goes and checks the MSBs for != , and finally it checks the LSBS and 
 emits a conditional branch to  the ELSE, followed by an unconditional branch 
 to the ELSE, so that I end up with code that looks like
 
   mov $r1,x
   mov $r2,x+4
cmpi $r2,0
jlt   .L5
cmpi  $r2,0   === totally redundant for x  0 comparisons
jne .L2
cmpi $r1,0
jmp .L4
 
 .L5 : movi $r1, 0
 jump .L4
 
 .L2  : movi $r1, 2
 
 .L4:
  ret
 
 
 This is a simplification of 950607-2.c, which fails at -O0, but passes at 
 higher optimization levels (go figure...)
 



problems in interaction between peephole on CALL_INSN and final_scan_insn

2012-07-08 Thread Alan Lehotsky
When a peephole is recognized, the first insn in the group is replaced by a 
pseudo insn that contains all the referenced operands in the TEMPLATE and sets 
an INSN_CODE to indicate which peephole matched.

This is all well and good, except that if the peephole involves a CALL_INSN, 
final_scan_insn() will invoke call_from_call_insn() to try and get the call 
RTL.  But if the peephole is in fact some kind of a tail call, we no longer 
have a call expression to be found and end up asserting in 
call_from_call_insn().

I think I can work around this by switching to a define_peephole2 converting 
the call  return into an unspec, or maybe by doing a match tthat grabs the 
whole call as an operand instead of just the function address.

I'm not sure if the correct fix to this involves changing the way genpeep.c 
works or changing call_from_call_insn to be more forgiving - either one seems 
really difficult unless there's existing code that transmutes top-level RTL 
among CALL_INSN, JUMP_INSN, etc already...


Just in case I'm doing something stupid, here's my peephole

(define_peephole
  [
   (parallel [(set (reg:SI RV_REGNUM)
   (call (match_operand:SI 0 memory_operand )
 (match_operand 1  )))
  (clobber (reg:SI LR_REGNUM))])
   (parallel [(use (match_operand 2 ieu_operand rm))
  (return)])
  ]
  !final_sequence
 
  {
 if (CONSTANT_P (operands[0]))
   return jmp\t%0\; mov\tr0,%2;
 else
   return ret\t%0\; mov\tr0,%2;
  }
 
  [(set_attr type call)]
)



Re: problems in interaction between peephole on CALL_INSN and final_scan_insn

2012-07-08 Thread Alan Lehotsky
I'm certain there are better ways; can you be more specific though?

Or are you just talking about defining a sibcall_epilogue pattern?

On Jul 8, 2012, at 5:26 PM, Andrew Pinski wrote:

 On Sun, Jul 8, 2012 at 2:23 PM, Alan Lehotsky qsm...@earthlink.net wrote:
 When a peephole is recognized, the first insn in the group is replaced by a 
 pseudo insn that contains all the referenced operands in the TEMPLATE and 
 sets an INSN_CODE to indicate which peephole matched.
 
 This is all well and good, except that if the peephole involves a CALL_INSN, 
 final_scan_insn() will invoke call_from_call_insn() to try and get the call 
 RTL.  But if the peephole is in fact some kind of a tail call, we no longer 
 have a call expression to be found and end up asserting in 
 call_from_call_insn().
 
 
 Simple answer don't use peephole optimization to perform the tail call
 optimization.  There are better ways of performing that optimization.
 
 Thanks,
 Andrew



pointer modes for Harvard architecture....

2012-01-28 Thread Alan Lehotsky
I'm working on a port to a Harvard architecture where the data memory addresses 
are only 14 bits wide (e.g. 16kb) and the instruction address space is 21 bits 
wide. 

I do not want to define Pmode as PSImode; the machine has separate address 
registers for data memory AND with such limited data memory, I really want data 
pointers to stay HImode.

I've noticed that some generated function calls are appearing as 

(call (mem:SI  (symbol_ref:HI (function_name) 

which I suspect is wrong for code addresses outside of the first 65kb of 
instruction memory.

It would be helpful to see an existing port with wider function pointers to 
help me avoid stumbling over some of these issues. Is there a current port that 
has larger instruction memory addresses than  data addresses? 








printed versions of GCC Internals book?

2011-11-01 Thread Alan Lehotsky
While I really like machine-readable (and searchable) text online for the GCC 
internals, there's still an atavistic streak in me that wants hard copy that I 
can put post-it notes on, run a highlighter over relevant passages or read when 
I'm not near a computer screen.

I have two bound hard-copies (but the newer one is GCC 2.95) and laser-printed 
newer editions, but I've decided I really miss the bound-book format.

Anybody have any experience with using one of the print-on-demand services to 
produce a recent version of the gccint manual?  I was actually kind of 
surprised that the FSF hasn't taken advantage of this as a fund-raising 
opportunity.

After the initial setup costs, it looks like the per/book price for the 700pg 
gccint would be about $20, but the setup fees (at least here 
http://www.harvard.com/on_our_shelves/in_store_book_printing/books_on_demand/ ) 
would be ~$100.

So, unless someone has already done this, is there anyone else who'd want to 
buy a printed copy at a price that would recover my investment in the setup 
costs and postage?  I'd be happy to turn over the whole project to the FSF so 
they could end up with an ongoing revenue stream once I break-even on the 
deal

I'd guess that with 10 copies, we'd be looking at ~$35/copy, which is about as 
high I price as I'd be willing to pay if I was reading this email instead of 
writing it.

So, are there 10 people out there who'd like a reasonably current version of 
the Internals book, or is there someone else who'd like to drive?

-- Al Lehotsky



Re: Question about perl while bootstrapping gcc

2010-04-16 Thread Alan Lehotsky
This is normal unix behavior (unless you have some kind of shell that I'm 
unfamiliar with.)

When you use  to create a subjob, it is still attached to your terminal 
session.

Take a look at the at(1) or batch(1) commands if you really want to execute a 
command and logout while it's still running.


-Original Message-
From: Dominique Dhumieres domi...@lps.ens.fr
Sent: Apr 16, 2010 2:10 PM
To: gcc@gcc.gnu.org
Subject: Question about perl while bootstrapping gcc

Hi!

I use to build gcc with a command line such as

make -j2  somelogfile 

I recently found that if I logout, the build fails with

perl: no user 501

Is this a bug or a feature? In the former case I'll open a PR.
In the later is it documented somewhere that you should not logout
while building gcc? If yes, is it possible to have a pointer?

TIA

Dominique



Re: Is it possible to port GCC backend to a architecture with very limited hard registers?

2010-03-17 Thread Alan Lehotsky
Almost certainly you will run into severe problems in the reload phase.

You might also profitably study the ip2k port.  This is a ALU machine, but it 
does have multiple
address registers.


-Original Message-
From: redriver jiang jiang.redri...@gmail.com
Sent: Mar 17, 2010 8:55 AM
To: gcc@gcc.gnu.org
Subject: Is it possible to port GCC backend to a architecture with very
limited hard registers?

Hi all,

Right now I attempts to port the GCC backend to a MCU with very
limited hard registers: only one 8 bit ACC reg, one 16 bit base reg
for addressing, one stats reg.
I searched the GCC backend porting, and seems 68HC1X has the similar
scene, but it use many ram simulated register. I wonder that if it
is possbile to provided thislimited 3 register to GCC bankend, and let
all 16bit(HImode), 32bit(SImode) operands spilled to stack.

Thanks!

Redriver



Re: How to control code segments ?

2008-11-30 Thread Alan Lehotsky

Look at the implementation of the IP2K compiler and linker.
It uses a segmented paged architecture just like the machine you are  
describing.


In essence what we did was implement linker relaxation to deal with  
this.
When we called any function, we emitted the appropriate long-call by  
setting

the page register and jumping to the location on that page.

In the linker, we implemented relaxation code that looked to see if
we were changing to the SAME page, and if so deleted the instruction  
changing
the PAGE and did a local jump to the destination.  Now, because a  
function could cross a page
boundary (we only had 4kb pages (and 16 bit instructions), all our  
branches were done this way

(if I recall correctly).

It's a little tedious, but not too technically demanding a solution

Al Lehotsky

On Nov 30, 2008, at 2:06 PM, Dong Phuong wrote:


I'm porting for a microcontroler which has segmented
memory.

THe memory is devided into many pages, each page is
16K. And I'm going to use 256 pages for code. But
these 256 pages are not continuous in physical memory,
so when I want to jump to a function, I have to know
what is the segment address of this function, and then
set the CSP with this value, and jump to it.

So what I want to know is if I'm in a function, is
there any way for me to know what code segments I'm
locating in ? If I know this,  when I have to jump to
another function, I can decide wheather this function
is in the same segment with the function that I'm
locating in, and then can decide if I have to change
the CSP.

And when I compile a long long program with so many
methods, is there any way for GCC so that it can
realize that the code has exceeded 16K and have to use
a new segments ? or the user must explicit declare
this in the C source program ?

If you know any hints or any doccument about this,
please show me. THank you very much.







Re: gcc compiler for pdp10

2008-04-18 Thread Alan Lehotsky

Martin,

  I did a port of GCC to the Analog Devices SHARC chip.  I ended up  
supporting 3 kinds of pointers for this chip (two for address
spaces and one for byte pointers - the chip itself is only word  
addressable (although words can be from 16 to 48 bits in size

depending on what memory is being accessed.)

I also worked on the Bliss-36 compiler at DEC, so I'm well acquainted  
with the PDP10 architecture.


I don't have access to any 10/20 HW, but I'd be happy to act as a  
reviewer/advisor to your changes.


Al Lehotsky

On Apr 18, 2008, at 20:21, Martin Chaney wrote:


Hi,

I'm am the proprietor of a gcc compiler for the PDP10 architecture.

(This is a compiler previously worked on by Lars Brinkhoff who left  
XKL some while before I joined XKL.  It's possible some of you may  
have been familiar with him or the compiler from that time.)


The compiler is currently in a state where it is synched with the  
both the 4.3 and 4.4 branches, and it passes the testsuite tests  
(with the exception of some I've flagged as expected failures for  
the pdp10).


My employer is happy to release my work on the gcc compiler back to  
the gcc community and I've sent in a request for the necessary forms.


The PDP10 architecture is unusual in various ways that distinguish  
it from the mainstream architectures supported by the gcc compiler  
and this has made the development of this compiler a significant  
task.  Undoubtedly I've made customizations in inappropriate ways.   
I'm seeking contacts with people who might be able to advise me on  
how to cleanup my implementation to reduce the amount of #ifdef  
__PDP10_H__ I've sprinkled liberally throughout the source.  Also,  
if its possible to get simple changes made to prevent breaking my  
PDP10 version and that are otherwise innocuous that would be  
wonderful.  For example, the PDP10 word size is 36 bits;  Fairly  
recently people have taken to writing code that assumes word size  
is a power of 2 even when it's straightforward to write in a manner  
that doesn't make that assumption.


Considering the large number of files customized to get the PDP10  
compiler working, I'm not sure whether it's possible to get it to  
build directly from the gcc trunk, but it would be nice to work  
toward that goal.


Some other things which distinguish the PDP10 architecture from  
assumptions in the gcc code base include: its variety of formats of  
pointers only one of which can be viewed as an integer and that one  
is capable of referencing only word aligned data, a functional  
difference between signed and unsigned integers, and peculiarities  
to the use of PDP10 byte arrays which are very difficult to describe.


Any help or advise would be appreciated.

Martin Chaney
XKL, LLC




Re: GCC Port (gcc backend) for Microchip PICMicro microcontroller

2006-04-11 Thread Alan Lehotsky


On Apr 11, 2006, at 03:46, Colm O' Flaherty wrote:

I'm not quite sure I follow you.. if its possible to dedicate a  
register to act as the data-stack pointer, and implement it that  
way, why would I want to keep the SP as a virtual register?  I'm  
not being antagonistic when I say that.. I'm just trying to  
understand what you're trying to tell me..


Sorry, thought you were indicating that you didn't WANT a data  
stack :-)  Now I understand that your chip

just doesn't provide hardware support for stacks.

BTW, another port I did (to a RISCy architecture that was the core  
for a high-speed multiprotocol router
(never submitted to the FSF and the company's now belly-up) provides  
a lower bound on how simple an

addressing scheme you can deal with.

This machine had 512 directly addressable memory locations and 4  
registers that could be indirected thru
(kinda like the way the PDP-8 worked).  With 3 registers available  
for reload (1 was reserved for the SP),
you could pretty much compile all the GCC test suite that didn't need  
more than 512 words of memory.

[Oh, BTW,  chars were 32 bits on this machine also]




Will check out the ip2k port again.. the last time I looked, I was  
blinded by the assumption that if the usual stack macros were  
defined in a straightforward fashion, that the target actually  
supported (or implemented) a stack... It ain't necessarily so.



you might be able to keep the SP as a virtual register and make sure
that code generation never tries to actually use it







Re: GCC Port (gcc backend) for Microchip PICMicro microcontroller

2006-04-10 Thread Alan Lehotsky

Again, the GCC3 distribution has a port of the IP2K microcontroller.

It has a hardware call stack, but the data stack is implemented  
entirely in software.


You will have to dedicate a register to act as the data-stack  
pointer.  I suppose if you limit yourself to
writing functions with NO stack-local data you might be able to keep  
the SP as a virtual register and make sure
that code generation never tries to actually use it.  You will also  
be severely limited in the ability to
pass parameters if you only allow register parameters with no  
parameter saving.


At this point, why bother writing a C compiler


On Apr 10, 2006, at 03:54, Colm O' Flaherty wrote:

Does anyone have any ideas about what gcc support is like for  
targets with no data stack?  The 14 bit cores (16F) mostly have a  
2-8 level hardware stack, which is not part of the program or data  
memory, and is not addressable.  There is no data stack.


I'm hoping that there is an existing backend architecture where  
there is no stack, so that I can have a peep to see how the code  
fakes stack support, but so far, all the obvious candidates (the  
microcontrollers) seem to have a stack.


Ideas, anyone?

Colm






Re: GCC port for V8-uRISC (8 bit CPU)

2006-04-05 Thread Alan Lehotsky
I participated in a port to an 8-bit internet toaster 4 years ago (the Ubicom 
IP2k chip).

It's distributed as part of the gcc-3.x releases, but has been dropped from the 
gcc-4.x distributions.

The IP2k was a very restrictive environment, and it took a lot of work to get 
it to generate really tight code.
I'd definely suggest looking at gcc/config/ip2k to see how we did it

-Original Message-
From: Nemanja Popov [EMAIL PROTECTED]
Sent: Apr 5, 2006 9:50 AM
To: gcc@gcc.gnu.org
Subject: GCC port for V8-uRISC (8 bit CPU)

Hi,

Can somebody please explain to me is it reasonable and possible to port gcc 
(version 4.xx) to 8 bit cpu architecture.
I would appreciate precise explanation why it is possible or not.

CPU is V8-uRISC.

V8-uRISC Features are:

  8-bit ALU
  64K byte addressing capability
  Accumulator (R0)
  Seven 8-bit General Purpose Registers (R1-R7)
  Multiple register banks are easily implemented
  16-bit Program Counter and Stack Pointer

Thanks in advance for all informations.
Regards,
Nemanja Popov 




Crazy ICE from gcc 4.1.0

2006-03-09 Thread Alan Lehotsky
I've built a generic 4.1.0 for RH7.3 x86 linux (I did a make bootstrap)

Compiling a rather large file, I get 

tmp.f_00.cxx:26432: error: unrecognizable insn:
(insn 173 172 174 9 (set (reg:QI 122)
(const_int 128 [0x80])) -1 (nil)
(nil))
tmp.f_00.cxx:26432: internal compiler error: in extract_insn, at 
recog.c:2020


Which looks insane, because there's a perfectly good define_insn (cf *movqi_1 
in i386.md)
I'm trying to reduce this to a reasonably sized test case (and I'm going to try 
debugging this in the recognizer),
but I can't see why this instruction isn't matching the 2nd constraint 
alternative and just producing a movb r,#128


(define_insn *movqi_1
  [(set (match_operand:QI 0 nonimmediate_operand =q,q ,q ,r,r ,?r,m)
(match_operand:QI 1 general_operand   q,qn,qm,q,rn,qm,qn))]
  GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM
{
  switch (get_attr_type (insn))
{
case TYPE_IMOVX:
  gcc_assert (ANY_QI_REG_P (operands[1]) || GET_CODE (operands[1]) == MEM);
  return movz{bl|x}\t{%1, %k0|%k0, %1};
default:
  if (get_attr_mode (insn) == MODE_SI)
return mov{l}\t{%k1, %k0|%k0, %k1};
  else
return mov{b}\t{%1, %0|%0, %1};
}
}



Re: Bug in PPC inline assembly?

2005-07-17 Thread Alan Lehotsky


On Jul 17, 2005, at 19:15, Stefan wrote:

I have some problems with using inline PowerPC assembly in GCC 
(4.0.1). Consider the following code:


   void save_fp_register(double* buffer)
   {
   asm(stfd F0,  0(%0) : : r (buffer) );
   }

Try using 'b' for the constraint - that selects for an address base 
register, as opposed to 'r'

that is any of the general registers (including R0)