date:20230811

Re: Weak attribute function limitation

2023-08-11 Thread Jason Merrill via Gcc


On 8/10/23 05:22, Ashraf, Islam via Gcc wrote:

I have a question regarding a limitation in gcc which is when I use gcc to link 
my main.o file with 2 .so files each one has a function with the same name but 
one of them has it with the __attribute__(weak) and I call this function from 
main.o the execution occurs dependently on the order of linking. Do you have a 
plan to fix this bug or is it not in your plans?


This doesn't seem like a GCC issue.  Linking and symbol resolution is 
handled either by the linker itself (from binutils) or the runtime 
loader ld.so, which is part of glibc.  As I recall the glibc ld.so used 
to have the behavior you seem to expect, but was deliberately changed 
many years ago to the behavior you observe, I think for performance reasons.


Jason

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

2023-08-11 Thread Changbin Du via Gcc

On Mon, Jul 31, 2023 at 08:55:35PM +0800, Changbin Du wrote:
> Hello, folks.
> This is to discuss Gcc's heuristic strategy about Predicated Instructions and
> Branches. And probably something needs to be improved.
> 
> [The story]
> Weeks ago, I built a huffman encoding program with O2, O3, and PGO 
> respectively.
> This program is nothing special, just a random code I found on the internet. 
> You
> can download it from http://cau.ac.kr/~bongbong/dsd08/huffman.c.
> 
> Build it with O2/O3/PGO (My GCC is 13.1):
> $ gcc -O2 -march=native -g -o huffman huffman.c
> $ gcc -O3 -march=native -g -o huffman.O3 huffman.c
> 
> $ gcc -O2 -march=native -g -fprofile-generate -o huffman.instrumented 
> huffman.c
> $ ./huffman.instrumented test.data
> $ gcc -O2 -march=native -g -fprofile-use=huffman.instrumented.gcda -o 
> huffman.pgo huffman.c
> 
> Run them on my 12900H laptop:
> $ head -c 50M /dev/urandom > test.data
> $ perf stat  -r3 --table -- taskset -c 0 ./huffman test.data
> $ perf stat  -r3 --table -- taskset -c 0 ./huffman.O3 test.data
> $ perf stat  -r3 --table -- taskset -c 0 ./huffman.pgo test.data
> 
> The result (p-core, no ht, no turbo, performance mode):
> 
> O2  O3  PGO
> cycles  2,581,832,749   8,638,401,568   9,394,200,585
> (1.07s) (3.49s) (3.80s)
> instructions12,609,600,094  11,827,675,782  12,036,010,638
> branches2,303,416,221   2,671,184,833   2,723,414,574
> branch-misses   0.00%   7.94%   8.84%
> cache-misses3,012,613   3,055,722   3,076,316
> L1-icache-load-misses   11,416,391  12,112,703  11,896,077
> icache_tag.stalls   1,553,521   1,364,092   1,896,066
> itlb_misses.stlb_hit6,856   21,756  22,600
> itlb_misses.walk_completed  14,430  4,454   15,084
> baclears.any131,573 140,355 131,644
> int_misc.clear_resteer_cycles   2,545,915   586,578,125 679,021,993
> machine_clears.count22,235  39,671  37,307
> dsb2mite_switches.penalty_cycles 6,985,838  12,929,675  8,405,493
> frontend_retired.any_dsb_miss   28,785,677  28,161,724  28,093,319
> idq.dsb_cycles_any  1,986,038,896   5,683,820,258   5,971,969,906
> idq.dsb_uops11,149,445,952  26,438,051,062  28,622,657,650
> idq.mite_uops   207,881,687 216,734,007 212,003,064
> 
> 
> Above data shows:
>   o O3/PGO lead to *2.3x/2.6x* performance drop than O2 respectively.
>   o O3/PGO reduced instructions by 6.2% and 4.5%. I think this attributes to
> aggressive inline.
>   o O3/PGO introduced very bad branch prediction. I will explain it later.
>   o Code built with O3 has high iTLB miss but much lower sTLB miss. This is 
> beyond
> my expectation.
>   o O3/PGO introduced 78% and 68% more machine clears. This is interesting and
> I don't know why. (subcategory MC is not measured yet)
>   o O3 has much higher dsb2mite_switches.penalty_cycles than O2/PGO.
>   o The idq.mite_uops of O3/PGO increased 4%, while idq.dsb_uops increased 2x.
> DSB hit well. So frontend fetching and decoding is not a problem for 
> O3/PGO.
>   o Other events are mainly affected by bad branch misprediction.
> 
> Additionally, here is the TMA level 2 analysis: The main changes in the 
> pipeline
> slots are of Bad Speculation and Frontend Bound categories. I doubt the 
> accuracy
> of tma_fetch_bandwidth according to above frontend_retired.any_dsb_miss and
> idq.mite_uops data.
> 
> $ perf stat --topdown --td-level=2 --cputype core -- taskset -c 0 ./huffman 
> test.data
> test.data.huf is 1.00% of test.data
> 
>  Performance counter stats for 'taskset -c 0 ./huffman test.data':
> 
>  %  tma_branch_mispredicts%  tma_core_bound %  tma_heavy_operations %  
> tma_light_operations  %  tma_memory_bound %  tma_fetch_bandwidth %  
> tma_fetch_latency %  tma_machine_clears
>0.0  0.8 11.4  
>76.8  2.0 8.3  
> 0.80.0
> 
>1.073381357 seconds time elapsed
> 
>0.945233000 seconds user
>0.095719000 seconds sys
> 
> 
> $ perf stat --topdown --td-level=2 --cputype core -- taskset -c 0 
> ./huffman.O3 test.data
> test.data.huf is 1.00% of test.data
> 
>  Performance counter stats for 'taskset -c 0 ./huffman.O3 test.data':
> 
>  %  tma_branch_mispredicts%  tma_core_bound %  tma_heavy_operations %  
> tma_light_operations  %  tma_memory_bound %  tma_fetch_bandwidth %  
> tma_fetch_latency %  tma_machine_clears
>   38.2  6.6  3.5  
>21.7  0.9

[pushed][LRA]: Implement output stack pointer reloads

2023-08-11 Thread Vladimir Makarov via Gcc


On 8/9/23 16:54, Vladimir Makarov wrote:



On 8/9/23 07:15, senthilkumar.selva...@microchip.com wrote:

Hi,

   After turning on FP -> SP elimination after Vlad fixed
   an elimination issue in 
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832,

   I'm now running into reload failure if arithmetic is done on SP.

I think we can permit to stack pointer output reloads.  The only thing 
we need to update sp offset accurately for the original and reload 
insns.  I'll try to make the patch on this week.



The following patch fixes the problem.  The patch was successfully 
bootstrapped and tested on x86_64, aarch64, and ppc64le.


The test case is actually one from GCC test suite.

commit c0121083d07ffd4a8424f4be50de769d9ad0386d
Author: Vladimir N. Makarov 
Date:   Fri Aug 11 07:57:37 2023 -0400

[LRA]: Implement output stack pointer reloads

LRA prohibited output stack pointer reloads but it resulted in LRA
failure for AVR target which has no arithmetic insns working with the
stack pointer register.  Given patch implements the output stack
pointer reloads.

gcc/ChangeLog:

* lra-constraints.cc (goal_alt_out_sp_reload_p): New flag.
(process_alt_operands): Set the flag.
(curr_insn_transform): Modify stack pointer offsets if output
stack pointer reload is generated.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 09ff6de1657..26239908747 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1466,6 +1466,8 @@ static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
 static int goal_alt_number;
+/* True if output reload of the stack pointer should be generated.  */
+static bool goal_alt_out_sp_reload_p;
 
 /* True if the corresponding operand is the result of an equivalence
substitution.  */
@@ -2128,6 +2130,9 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  /* True if output stack pointer reload should be generated for the current
+ alternative.  */
+  bool curr_alt_out_sp_reload_p;
   rtx op;
   /* The register when the operand is a subreg of register, otherwise the
  operand itself.  */
@@ -2211,7 +2216,8 @@ process_alt_operands (int only_alternative)
 	}
   reject += static_reject;
   early_clobbered_regs_num = 0;
-
+  curr_alt_out_sp_reload_p = false;
+  
   for (nop = 0; nop < n_operands; nop++)
 	{
 	  const char *p;
@@ -2682,12 +2688,10 @@ process_alt_operands (int only_alternative)
 	  bool no_regs_p;
 
 	  reject += op_reject;
-	  /* Never do output reload of stack pointer.  It makes
-		 impossible to do elimination when SP is changed in
-		 RTL.  */
-	  if (op == stack_pointer_rtx && ! frame_pointer_needed
+	  /* Mark output reload of the stack pointer.  */
+	  if (op == stack_pointer_rtx
 		  && curr_static_id->operand[nop].type != OP_IN)
-		goto fail;
+		curr_alt_out_sp_reload_p = true;
 
 	  /* If this alternative asks for a specific reg class, see if there
 		 is at least one allocatable register in that class.  */
@@ -3317,6 +3321,7 @@ process_alt_operands (int only_alternative)
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
+	  goal_alt_out_sp_reload_p = curr_alt_out_sp_reload_p;
 	  best_overall = overall;
 	  best_losers = losers;
 	  best_reload_nregs = reload_nregs;
@@ -4836,6 +4841,27 @@ curr_insn_transform (bool check_only_p)
 	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
+  if (goal_alt_out_sp_reload_p)
+{
+  /* We have an output stack pointer reload -- update sp offset: */
+  rtx set;
+  bool done_p = false;
+  poly_int64 sp_offset = curr_id->sp_offset;
+  for (rtx_insn *insn = after; insn != NULL_RTX; insn = NEXT_INSN (insn))
+	if ((set = single_set (insn)) != NULL_RTX
+	&& SET_DEST (set) == stack_pointer_rtx)
+	  {
+	lra_assert (!done_p);
+	curr_id->sp_offset = 0;
+	lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	id->sp_offset = sp_offset;
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Moving sp offset from insn %u to %u\n",
+		   INSN_UID (curr_insn), INSN_UID (insn));
+	  }
+  lra_assert (!done_p);
+}
   return change_p;
 }

[pushed][LRA]: Implement output stack pointer reloads

2023-08-11 Thread Vladimir Makarov via Gcc


On 8/9/23 16:54, Vladimir Makarov wrote:



On 8/9/23 07:15, senthilkumar.selva...@microchip.com wrote:

Hi,

   After turning on FP -> SP elimination after Vlad fixed
   an elimination issue in 
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832,

   I'm now running into reload failure if arithmetic is done on SP.

I think we can permit to stack pointer output reloads.  The only thing 
we need to update sp offset accurately for the original and reload 
insns.  I'll try to make the patch on this week.



The following patch fixes the problem.  The patch was successfully 
bootstrapped and tested on x86_64, aarch64, and ppc64le.


The test case is actually one from GCC test suite.

commit c0121083d07ffd4a8424f4be50de769d9ad0386d
Author: Vladimir N. Makarov 
Date:   Fri Aug 11 07:57:37 2023 -0400

[LRA]: Implement output stack pointer reloads

LRA prohibited output stack pointer reloads but it resulted in LRA
failure for AVR target which has no arithmetic insns working with the
stack pointer register.  Given patch implements the output stack
pointer reloads.

gcc/ChangeLog:

* lra-constraints.cc (goal_alt_out_sp_reload_p): New flag.
(process_alt_operands): Set the flag.
(curr_insn_transform): Modify stack pointer offsets if output
stack pointer reload is generated.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 09ff6de1657..26239908747 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1466,6 +1466,8 @@ static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
 static int goal_alt_number;
+/* True if output reload of the stack pointer should be generated.  */
+static bool goal_alt_out_sp_reload_p;
 
 /* True if the corresponding operand is the result of an equivalence
substitution.  */
@@ -2128,6 +2130,9 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  /* True if output stack pointer reload should be generated for the current
+ alternative.  */
+  bool curr_alt_out_sp_reload_p;
   rtx op;
   /* The register when the operand is a subreg of register, otherwise the
  operand itself.  */
@@ -2211,7 +2216,8 @@ process_alt_operands (int only_alternative)
 	}
   reject += static_reject;
   early_clobbered_regs_num = 0;
-
+  curr_alt_out_sp_reload_p = false;
+  
   for (nop = 0; nop < n_operands; nop++)
 	{
 	  const char *p;
@@ -2682,12 +2688,10 @@ process_alt_operands (int only_alternative)
 	  bool no_regs_p;
 
 	  reject += op_reject;
-	  /* Never do output reload of stack pointer.  It makes
-		 impossible to do elimination when SP is changed in
-		 RTL.  */
-	  if (op == stack_pointer_rtx && ! frame_pointer_needed
+	  /* Mark output reload of the stack pointer.  */
+	  if (op == stack_pointer_rtx
 		  && curr_static_id->operand[nop].type != OP_IN)
-		goto fail;
+		curr_alt_out_sp_reload_p = true;
 
 	  /* If this alternative asks for a specific reg class, see if there
 		 is at least one allocatable register in that class.  */
@@ -3317,6 +3321,7 @@ process_alt_operands (int only_alternative)
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
+	  goal_alt_out_sp_reload_p = curr_alt_out_sp_reload_p;
 	  best_overall = overall;
 	  best_losers = losers;
 	  best_reload_nregs = reload_nregs;
@@ -4836,6 +4841,27 @@ curr_insn_transform (bool check_only_p)
 	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
+  if (goal_alt_out_sp_reload_p)
+{
+  /* We have an output stack pointer reload -- update sp offset: */
+  rtx set;
+  bool done_p = false;
+  poly_int64 sp_offset = curr_id->sp_offset;
+  for (rtx_insn *insn = after; insn != NULL_RTX; insn = NEXT_INSN (insn))
+	if ((set = single_set (insn)) != NULL_RTX
+	&& SET_DEST (set) == stack_pointer_rtx)
+	  {
+	lra_assert (!done_p);
+	curr_id->sp_offset = 0;
+	lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	id->sp_offset = sp_offset;
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Moving sp offset from insn %u to %u\n",
+		   INSN_UID (curr_insn), INSN_UID (insn));
+	  }
+  lra_assert (!done_p);
+}
   return change_p;
 }

Re: LRA for avr: help with FP and elimination

2023-08-11 Thread Vladimir Makarov via Gcc




On 8/10/23 07:33, senthilkumar.selva...@microchip.com wrote:


Hi Vlad,

   I can confirm your commit 
(https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832)
   fixes the above problem, thank you. However, I see execution failures if a
   pseudo assigned to FP has to be spilled because of stack slot creation.

   To reproduce, build the compiler just like above, and then do

$ avr-gcc -mmcu=avr51 
/gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S 
-fdump-rtl-all

   The execution failure occurs at this point

movw r24,r2
sbiw r24,36
brne .L8

r2 is never set anywhere at all in the assembly.

The relevant insns (in the IRA dump) are

(insn 3 15 4 3 (set (reg/v:HI 51 [ j ])
 (const_int 0 [0])) 
"gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split}
  (expr_list:REG_EQUAL (const_int 0 [0])
 (nil)))
...
(insn 28 27 67 8 (parallel [
 (set (reg/v:HI 51 [ j ])
 (plus:HI (reg/v:HI 51 [ j ])
 (const_int 1 [0x1])))
 (clobber (scratch:QI))
 ]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8
 175 {addhi3_clobber}
  (nil))
...
(jump_insn 44 43 45 13 (parallel [
 (set (pc)
 (if_then_else (ne (reg/v:HI 51 [ j ])
 (const_int 36 [0x24]))
 (label_ref:HI 103)
 (pc)))
 (clobber (scratch:QI))
 ]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16
 discrim 1 713 {cbranchhi4_insn}
  (expr_list:REG_DEAD (reg/v:HI 51 [ j ])
 (int_list:REG_BR_PROB 7 (nil)))
  -> 103)

   LRA deletes insns 3 and 28, and uses r2 in the jump_insn.

   In the reload dump, for pseudo r51, I'm seeing this
subreg regs:
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard regs: 28 29
 Spilling r51(28)
   Slot 0 regnos (width = 0):46
   Slot 1 regnos (width = 0):45

   lra_update_fp2sp_elimination calls spill_pseudos with
   HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1.

   Later down the line, process_bb_lives is called with dead_insn_p=true from
   lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out
   on that BB does not contain 51 (even though previous calls to the same BB 
did).
   
Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffd838: 25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664

664   function_abi last_call_abi = default_function_abi;
(gdb) n
666   reg_live_out = df_get_live_out (bb);
(gdb)
667   sparseset_clear (pseudos_live);
(gdb) p debug_bitmap(reg_live_out)

first = 0x321c128 current = 0x321c128 indx = 0
0x321c128 next = (nil) prev = (nil) indx = 0
bits = { 28 32 34 43 44 47 48 49 50 }

   process_bb_lives then considers the insn setting 51 (and
   the reload insns LRA created) as dead, and removes them.

BB 8
Insn 67: point = 31, n_alt = -1
Insn 114: point = 31, n_alt = 3
Deleting dead insn 114
deleting insn with uid = 114.
Insn 28: point = 31, n_alt = 1
Deleting dead insn 28
deleting insn with uid = 28.
Insn 113: point = 31, n_alt = 2
Deleting dead insn 113

Same for insn 3 as well

BB 3
Insn 92: point = 40, n_alt = -1
Insn 5: point = 40, n_alt = 1
Insn 4: point = 41, n_alt = 3
Insn 3: point = 42, n_alt = 3
Deleting dead insn 3
deleting insn with uid = 3.

   Yet when it prints "Global pseudo live data has been updated" after
   all this, r51 is live again :(

BB 8:
 livein: 8:

43   44   47   48   49   50   51
 liveout: 8:

28   32   34   43   44   47   48   49   50   51

   Eventually, it assigns 2 to r51, resulting in just the compare and
   branch instruction remaining in the assembly.

   Is this an LRA bug or is the target doing something wrong?

I've reproduced this.  Probably it is a bug with live info update when 
fp->sp elimination became invalid.


I'll start to work on this problem on the next week and hope to have a 
fix soon after that.

[COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc

Thanks for the feedback! I've incorporated the changes (aside from
expanding test coverage, which I plan on releasing in a follow-up),
rebased, and performed a bootstrap and regtest on
aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
with nits fixed and no problems after rebase, the patch has now been pushed. 

Best,
Eric

---

This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* call-details.h: New function.
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/call-details.h   |   4 +
 gcc/analyzer/region-model.cc  |  17 +-
 gcc/analyzer/region-model.h   |  14 +-
 gcc/analyzer/sm-malloc.cc |  42 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 8 files changed, 899 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 24be2247e63..bf2601151ea 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -49,6 +49,10 @@ public:
 return POINTER_TYPE_P (get_arg_type (idx));
   }
   bool arg_is_size_p (unsigned idx) const;
+  bool arg_is_integral_p (unsigned idx) const
+  {
+return INTEGRAL_TYPE_P (get_arg_type (idx));
+  }
 
   const gcall *get_call_stmt () const { return m_call; }
   location_t get_location () const;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 094b7af3dbc..aa9fe008b9d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing heap_allocated_region if it's not being referenced by
-   this region_model; otherwise create a new one.  */
+   this region_model; otherwise create a new one.
+
+   Optionally (update_state_machine) transitions the pointer pointing to the
+   heap_allocated_region from start to assumed non-null.  */
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-  region_mod

Should GCC warn about sizeof(flexible_struct)?

2023-08-11 Thread Alejandro Colomar via Gcc

Hi!

Structures with flexible array members have restrictions about being
used in arrays or within other structures, as the size of the enclosing
aggregate type would be... inconsistent.

In general, sizeof(flexible_struct) is a problematic thing that rarely
means what programmers think it means.  It is not the size of the
structure up to the flexible array member; or expressed using C,
the following can be true:

sizeof(s) != offsetof(s, fam)

See the program at the bottom that demonstrates how this is problematic.

It's true that if one uses

malloc(offseof(s, fam) + sizeof_member(s, fam[0]) * N);

and N is very small (0 or 1 usually), the allocation would be smaller
than the object size, which for GCC seems to be fine, but I'm worried the
standard is not clear enough about its validity[1].

[1]: 

To avoid having UB there, pedantically one would need to call

malloc(MAX(sizeof(s), offseof(s, fam) + sizeof_member(s, fam[0]) * N));

But I think that's the only correct use of sizeof() with structures
containing flexible array members.  So it seems sizeof() by itself is
a valid thing, but when adding it to something else to get the total size,
or doing any arithmetic with it, that's dubious code.

How about some -Wfam-sizeof-arithmetic that would not warn about taking
sizeof(s) but would warn if that sizeof is used in any arithmetic?

Cheers,
Alex

---

$ cat off.c 
#include 
#include 
#include 
#include 
#include 


struct s {
int   i;
char  c;
char  fam[];
};


static inline void *xmalloc(size_t size);


int
main(void)
{
char  *p;
struct s  *s;

printf("sizeof: %zu\n", sizeof(struct s));
printf("offsetof: %zu\n", offsetof(struct s, fam));

puts("\nWith sizeof():");

s = xmalloc(sizeof(struct s) + sizeof("Hello, sizeof!"));
strcpy(s->fam, "Hello, sizeof!");
p = (char *) s + sizeof(struct s);
puts(p);
free(s);

puts("\nWith offsetof(3):");

s = xmalloc(offsetof(struct s, fam) + sizeof("Hello, offsetof!"));
strcpy(s->fam, "Hello, offsetof!");
p = (char *) s + offsetof(struct s, fam);
puts(p);
free(s);

exit(EXIT_SUCCESS);
}


static inline void *
xmalloc(size_t size)
{
void  *p;

p = malloc(size);
if (p == NULL)
err(EXIT_FAILURE, "malloc");
return p;
}
$ gcc-13 -Wall -Wextra -Wpadded -fanalyzer off.c
off.c:12:1: warning: padding struct size to alignment boundary with 3 bytes 
[-Wpadded]
   12 | };
  | ^


The only warning I know that is triggered in the code above is -Wpadded,
which is related to this problem, but I think there should be something
to warn about sizeof() in this context.


-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature

Re: Should GCC warn about sizeof(flexible_struct)?

2023-08-11 Thread Alejandro Colomar via Gcc

On 2023-08-11 20:29, Alejandro Colomar wrote:
> Hi!
> 
> Structures with flexible array members have restrictions about being
> used in arrays or within other structures, as the size of the enclosing
> aggregate type would be... inconsistent.
> 
> In general, sizeof(flexible_struct) is a problematic thing that rarely
> means what programmers think it means.  It is not the size of the
> structure up to the flexible array member; or expressed using C,
> the following can be true:
> 
>   sizeof(s) != offsetof(s, fam)
> 
> See the program at the bottom that demonstrates how this is problematic.
> 
> It's true that if one uses
> 
>   malloc(offseof(s, fam) + sizeof_member(s, fam[0]) * N);
> 
> and N is very small (0 or 1 usually), the allocation would be smaller
> than the object size, which for GCC seems to be fine, but I'm worried the
> standard is not clear enough about its validity[1].
> 
> [1]: 
> 
> To avoid having UB there, pedantically one would need to call
> 
>   malloc(MAX(sizeof(s), offseof(s, fam) + sizeof_member(s, fam[0]) * N));
> 
> But I think that's the only correct use of sizeof() with structures
> containing flexible array members.  So it seems sizeof() by itself is
> a valid thing, but when adding it to something else to get the total size,
> or doing any arithmetic with it, that's dubious code.
> 
> How about some -Wfam-sizeof-arithmetic that would not warn about taking
> sizeof(s) but would warn if that sizeof is used in any arithmetic?
> 
> Cheers,
> Alex
> 
> ---
> 
> $ cat off.c 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> 
> struct s {
>   int   i;
>   char  c;
>   char  fam[];
> };
> 
> 
> static inline void *xmalloc(size_t size);
> 
> 
> int
> main(void)
> {
>   char  *p;
>   struct s  *s;
> 
>   printf("sizeof: %zu\n", sizeof(struct s));
>   printf("offsetof: %zu\n", offsetof(struct s, fam));
> 
>   puts("\nWith sizeof():");
> 
>   s = xmalloc(sizeof(struct s) + sizeof("Hello, sizeof!"));
>   strcpy(s->fam, "Hello, sizeof!");
>   p = (char *) s + sizeof(struct s);
>   puts(p);
>   free(s);
> 
>   puts("\nWith offsetof(3):");
> 
>   s = xmalloc(offsetof(struct s, fam) + sizeof("Hello, offsetof!"));
>   strcpy(s->fam, "Hello, offsetof!");
>   p = (char *) s + offsetof(struct s, fam);
>   puts(p);
>   free(s);
> 
>   exit(EXIT_SUCCESS);
> }
> 
> 
> static inline void *
> xmalloc(size_t size)
> {
>   void  *p;
> 
>   p = malloc(size);
>   if (p == NULL)
>   err(EXIT_FAILURE, "malloc");
>   return p;
> }
> $ gcc-13 -Wall -Wextra -Wpadded -fanalyzer off.c
> off.c:12:1: warning: padding struct size to alignment boundary with 3 bytes 
> [-Wpadded]
>12 | };
>   | ^

I forgot to paste the output of the program:)

$ ./a.out 
sizeof: 8
offsetof: 5

With sizeof():
lo, sizeof!

With offsetof(3):
Hello, offsetof!

> 
> 
> The only warning I know that is triggered in the code above is -Wpadded,
> which is related to this problem, but I think there should be something
> to warn about sizeof() in this context.
> 
> 

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature

Re: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc

I've noticed there were still some strange indentations in the last
patch ... however, I think I've finally figured out a sane formatting
solution for me (fingers crossed). I will address them in the
follow-up patch at the same time as adding more test coverage.

---

In case, anyone else using VSCode has been having issues with
formatting according to GNU/GCC conventions, these are the relevant
formatting settings that I've found work for me. Assuming the C/C++
extension is installed, then in settings.json:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 8 }"

Just setting the base style to GNU formats everything correctly except
for the fact that indentation defaults to spaces (which is what I've
been struggling with fixing manually in the last few patches). The
rest of the settings are for replacing blocks of 8 spaces with tabs
(which is a requirement in check_GNU_style). In combination, this
works for everything except for header files for some reason, but I'll
defer that battle to another day.

On Fri, Aug 11, 2023 at 1:47 PM Eric Feng  wrote:
>
> Thanks for the feedback! I've incorporated the changes (aside from
> expanding test coverage, which I plan on releasing in a follow-up),
> rebased, and performed a bootstrap and regtest on
> aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
> with nits fixed and no problems after rebase, the patch has now been pushed.
>
> Best,
> Eric
>
> ---
>
> This patch adds known function subclasses for Python/C API functions
> PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
> optional parameters for
> region_model::get_or_create_region_for_heap_alloc, allowing for the
> newly allocated region to immediately transition from the start state to
> the assumed non-null state in the malloc state machine if desired.
> Finally, it adds a new procedure, dg-require-python-h, intended as a
> directive in Python-related analyzer tests, to append necessary Python
> flags during the tests' build process.
>
> The main warnings we gain in this patch with respect to the known function
> subclasses mentioned are leak related. For example:
>
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
> │
>21 |   return list;
>   │
>   |  ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
> |
> │
> |4 |   PyObject* item = PyLong_FromLong(10);
> │
> |  |^~~
> │
> |  ||
> │
> |  |(1) allocated here
> │
> |  |(2) when ‘PyLong_FromLong’ succeeds
> │
> |5 |   PyObject* list = PyList_New(2);
> │
> |  |~
> │
> |  ||
> │
> |  |(3) when ‘PyList_New’ fails
> │
> |..
> │
> |   21 |   return list;
> │
> |  |  
> │
> |  |  |
> │
> |  |  (4) ‘item’ leaks here; was allocated at (1)
> │
>
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual 
> implementation
> follows.
>
> gcc/analyzer/ChangeLog:
> PR analyzer/107646
> * call-details.h: New function.
> * region-model.cc (region_model::get_or_create_region_for_heap_alloc):
> New optional parameters.
> * region-model.h (class region_model): New optional parameters.
> * sm-malloc.cc (on_realloc_with_move): New function.
> (region_model::transition_ptr_sval_non_null): New function.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
> PyList_New, PyList_Append, PyLong_FromLong
> * gcc.dg/plugin/plugin.exp: New test.
> * lib/target-supports.exp: New procedure.
> * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
>
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/call-details.h   |   4 +
>  gcc/analyzer/region-model.cc  |  17 +-
>  gcc/analyzer/region-model.h   |  14 +-
>  gcc/analyzer/sm-malloc.cc |  42 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
>  gcc/testsuite/lib/target-supports.exp |  25 +
>  8 files changed, 899 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
>
> diff

gcc-12-20230811 is now available

2023-08-11 Thread GCC Administrator via Gcc

Snapshot gcc-12-20230811 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20230811/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 1cb272a8b1669e438aadf4a95da3979ac07ca9ce

You'll find:

 gcc-12-20230811.tar.xz   Complete GCC

  SHA256=be5db2ea1b8c645884c4321fc9d91a02bb986b76f326fd47ebe09fbfc14da78c
  SHA1=3e550eb3059692665730491c0a4f8a4f96110ae9

Diffs from 12-20230804 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: Mapping of TREE_CODE to tree_node

2023-08-11 Thread Aaron Lorey via Gcc

Am Mo., 3. Juli 2023 um 02:50 Uhr schrieb Andrew Pinski :
>
> On Sun, Jul 2, 2023 at 5:48 PM Aaron Lorey via Gcc  wrote:
> >
> > Am Mo., 26. Juni 2023 um 20:09 Uhr schrieb David Malcolm 
> > :
> > >
> > > On Mon, 2023-06-26 at 18:59 +0200, Aaron Lorey via Gcc wrote:
> > > > Hello,
> > > >
> > > > this is the first time I am writing to a mailing list. I've tried
> > > > researching the normal procedure but nothing special seems to be
> > > > required.
> > > >
> > > > I'm currently trying to do a complete graph-discovery of GCC's symtab
> > > > /
> > > > tree_nodes to dump the full internal representation of the
> > > > compilation
> > > > unit. Gitlab: https://gitlab.com/graph-prog/code-database
> > > >
> > > > It is not exceptionally heavy but also not very easy to serialize the
> > > > internal state to disk. I think this task was simply not considered
> > > > in the
> > > > design.
> > > >
> > > > Reason for writing to the mailing list are the troubles in connecting
> > > > the
> > > > TREE_CODE enumeration to the appropriate struct tree_node memory
> > > > layout
> > > > without guessing.
> > > >
> > > > Can you provide a mapping of TREE_CODE to tree_node memory layout?
> > >
> > > I don't know that such a mapping exists directly, but have a look at
> > > the functions "tree_code_size" and "tree_size" defined in gcc/tree.cc.
> > >
> > > You might also find the LTO streaming code of interest; see gcc/lto-
> > > streamer-{in,out}.cc
> > >
> > > Hope this is helpful
> > > Dave
> > >
> > >
> >
> > Thank you for your reply.
> >
> > The tree_size() and tree_code_size() functions are useful, although 
> > incomplete.
> >
> > If I understand correctly, the link time optimization works on the
> > GIMPLE representation. The original syntax tree and symbol table would
> > be preferable.
>
> You could also look into the module support in the C++ front-end,
> `gcc/cp/module.cc ` which does store out the original trees and such.
>
> Thanks,
> Andrew
>
> >
> > Andrew's suggestion might be more what I'm looking for.

I've now managed to dump the syntax tree of the compilation unit
(tree_function_decl.saved_tree -> tree_exp.operands ->
tree_statement_list.nodes). Thank you very much for the help!

In order to print out the original code, I need to know which program
code was translated to the individual nodes. Is there a chance to get
the original tokens (or the offsets in the program code file) per
tree_node without modifying the parser?

Re: _Nullable and _Nonnull in GCC's analyzer (was: [PATCH v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h)

2023-08-11 Thread enh via Gcc

On Wed, Aug 9, 2023 at 12:26 AM Martin Uecker  wrote:
>
> Am Dienstag, dem 08.08.2023 um 17:14 -0700 schrieb enh:
> > (bionic maintainer here, mostly by accident...)
> >
> > yeah, we tried the GCC attributes once before with _disastrous_
> > results (because GCC saw it as an excuse to delete explicit null
> > checks, it broke all kinds of things).
>
> Thanks! I am currently exploring different options and what
> could be done to improve the situation from the language
> as well as compile side.  It looks we have some partial
> solution but they do not work quite right or do not
> fit together perfectly.
>
>
> >  the clang attributes are
> > "better" in that they don't confuse two entirely separate notions ...
> > but they're not "good" because the analysis is so limited. i think we
> > were hoping for something more like the "used but not set" kind of
> > diagnostic, but afaict it really only catches _direct_ use of a
> > literal nullptr. so:
> >
> >   foo(nullptr); // caught
> >
> >   void* p = nullptr;
> >   foo(p); // not caught
> >
> > without getting on to anything fancy like:
> >
> >   void* p;
> >   if (a) p = bar();
> >   foo(p);
> >
> > we've done all the annotations anyway, but we're not expecting to
> > actually catch many bugs with them, and in fact did not catch any real
> > bugs in AOSP while adding the annotations. (though we did find several
> > "you're not _wrong_, but ..." bits of code :-) )
> >
> > what i really want for christmas is the annotation that lets me say
> > "this size_t argument tells you the size of this array argument" and
> > actually does something usefully fortify-like with that.
> >
>
> What is your opinion about the access attribute?
>
> https://godbolt.org/z/PPfajefvP
>
> it is a bit cumbersome to use, but one can use [static]
> instead, which gives you the same static warnings.

yeah, "that's hard to read" was actually my first reaction. maybe it's
just because i'm a libc maintainer used to the printf_like attribute,
but i actually find the regular __attribute__() style more readable.
you probably need to ask people who _consume_ more headers than they
write :-)

> [static] does not work with __builtin_dynamic_object_size,
> but maybe this could be changed (there is a bug filed.)
>
> I am not sure whether [static] should imply [[gnu::nonnull]]
> which would then also trigger the optimization. I think
> clang uses it for optimization.

yeah, that was what made us revert the old gcc nonnull annotations ---
we don't have any use case for "please use this to omit checks"
because we're generally dealing with public API. having the compiler
dead-code eliminate our attempts to return sensible errors was counter
to our goals, and seems like it would be in any place where we'd use a
bounds annotation too --- it'll be those worried about security (who
want to fail fast, and _before_ adding a potentially caller-controlled
index to an array base address!) who i'd expect to see using this kind
of thing first.

> Martin
>
>
> >  i've seen
> > proposals like 
> > https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/
> > but i haven't seen anything i can use yet, so you -- who do use GCC
> > which aiui has something along these lines already -- will find out
> > what's good/bad about it before i do...
>
>
>
> >
> > On Tue, Aug 8, 2023 at 3:01 AM Martin Uecker  wrote:
> > >
> > > Am Montag, dem 10.07.2023 um 22:16 +0200 schrieb Alejandro Colomar via 
> > > Gcc:
> > > > On 7/10/23 22:14, Alejandro Colomar wrote:
> > > > > [CC += Andrew]
> > > > >
> > > > > Hi Xi, Andrew,
> > > > >
> > > > > On 7/10/23 20:41, Xi Ruoyao wrote:
> > > > > > Maybe we should have a weaker version of nonnull which only 
> > > > > > performs the
> > > > > > diagnostic, not the optimization.  But it looks like they hate the 
> > > > > > idea:
> > > > > > https://gcc.gnu.org/PR110617.
> > > > > >
> > > > > This is the one thing that makes me use both Clang and GCC to compile,
> > > > > because while any of them would be enough to build, I want as much
> > > > > static analysis as I can get, and so I want -fanalyzer (so I need 
> > > > > GCC),
> > > > > but I also use _Nullable (so I need Clang).
> > > > >
> > > > > If GCC had support for _Nullable, I would have in GCC the superset of
> > > > > features that I need from both in a single vendor.  Moreover, Clang's
> > > > > static analyzer is brain-damaged (sorry, but it doesn't have a simple
> > > > > command line to run it, contrary to GCC's easy -fanalyzer), so having
> > > > > GCC's analyzer get over those _Nullable qualifiers would be great.
> > > > >
> > > > > Clang's _Nullable (and _Nonnull) are not very useful outside of 
> > > > > analyzer
> > > > > mode, as there are many cases where the compiler doesn't have enough
> > > > > information, and the analyzer can get rid of false negatives and
> > > > > positives.  See: 
> > > > >
> > > > > I'll back the ask for the qualifiers

Re: Weak attribute function limitation

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

[pushed][LRA]: Implement output stack pointer reloads

[pushed][LRA]: Implement output stack pointer reloads

Re: LRA for avr: help with FP and elimination

[COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

Should GCC warn about sizeof(flexible_struct)?

Re: Should GCC warn about sizeof(flexible_struct)?

Re: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

gcc-12-20230811 is now available

Re: Mapping of TREE_CODE to tree_node

Re: _Nullable and _Nonnull in GCC's analyzer (was: [PATCH v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h)

12 matches

Site Navigation

Mail list logo

Footer information