[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Michael Meissner changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #21 from Michael Meissner --- Fixed in trunk. Back ported to GCC 13, GCC 12, and GCC 11. The bug does not show up in GCC 10. Closing bug.
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #20 from CVS Commits --- The releases/gcc-11 branch has been updated by Michael Meissner : https://gcc.gnu.org/g:1896ab1cab76df1ebf12b876f696eac23436170b commit r11-10895-g1896ab1cab76df1ebf12b876f696eac23436170b Author: Michael Meissner Date: Wed Jul 5 15:50:15 2023 -0400 Fix power10 fusion bug with prefixed loads, PR target/105325 This changes fixes PR target/105325. PR target/105325 is a bug where an invalid lwa instruction is generated due to power10 fusion of a load instruction to a GPR and an compare immediate instruction with the immediate being -1, 0, or 1. In some cases, when the load instruction is done, the GCC compiler would generate a load instruction with an offset that was too large to fit into the normal load instruction. In particular, loads from the stack might originally have a small offset, so that the load is not a prefixed load. However, after the stack is set up, and register allocation has been done, the offset now is large enough that we would have to use a prefixed load instruction. The support for prefixed loads did not consider that patterns with a fused load and compare might have a prefixed address. Without this support, the proper prefixed load won't be generated. In the original code, when the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. When the pattern was created, ds_form_mem_operand recognized the insn as being valid since the offset was small. But after register allocation, ds_form_mem_operand did not return true. Because it didn't return true, the insn could not be split. Since the insn was not split and the prefix support did not indicate a prefixed instruction was used, the wrong load is generated. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the insn length will be 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes" for lwa. 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). 2023-06-23 Michael Meissner gcc/ PR target/105325 * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. (cherry picked from commit 370de1488a9a49956c47e5ec8c8f1489b4314a34) Co-Authored-By: Aaron Sawdey
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #19 from CVS Commits --- The releases/gcc-12 branch has been updated by Michael Meissner : https://gcc.gnu.org/g:7fc075626012b9fd09b20049d8681f2d72395f5c commit r12-9755-g7fc075626012b9fd09b20049d8681f2d72395f5c Author: Michael Meissner Date: Wed Jul 5 14:08:58 2023 -0400 Fix power10 fusion bug with prefixed loads, PR target/105325 This changes fixes PR target/105325. PR target/105325 is a bug where an invalid lwa instruction is generated due to power10 fusion of a load instruction to a GPR and an compare immediate instruction with the immediate being -1, 0, or 1. In some cases, when the load instruction is done, the GCC compiler would generate a load instruction with an offset that was too large to fit into the normal load instruction. In particular, loads from the stack might originally have a small offset, so that the load is not a prefixed load. However, after the stack is set up, and register allocation has been done, the offset now is large enough that we would have to use a prefixed load instruction. The support for prefixed loads did not consider that patterns with a fused load and compare might have a prefixed address. Without this support, the proper prefixed load won't be generated. In the original code, when the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. When the pattern was created, ds_form_mem_operand recognized the insn as being valid since the offset was small. But after register allocation, ds_form_mem_operand did not return true. Because it didn't return true, the insn could not be split. Since the insn was not split and the prefix support did not indicate a prefixed instruction was used, the wrong load is generated. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the insn length will be 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes" for lwa. 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). 2023-06-23 Michael Meissner gcc/ PR target/105325 * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. (cherry picked from commit 370de1488a9a49956c47e5ec8c8f1489b4314a34) Co-Authored-By: Aaron Sawdey
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #18 from CVS Commits --- The releases/gcc-13 branch has been updated by Michael Meissner : https://gcc.gnu.org/g:68aa17cff9279d2f3acebaf4d5cb9ababe743046 commit r13-7535-g68aa17cff9279d2f3acebaf4d5cb9ababe743046 Author: Michael Meissner Date: Wed Jul 5 12:44:55 2023 -0400 Fix power10 fusion bug with prefixed loads, PR target/105325 This changes fixes PR target/105325. PR target/105325 is a bug where an invalid lwa instruction is generated due to power10 fusion of a load instruction to a GPR and an compare immediate instruction with the immediate being -1, 0, or 1. In some cases, when the load instruction is done, the GCC compiler would generate a load instruction with an offset that was too large to fit into the normal load instruction. In particular, loads from the stack might originally have a small offset, so that the load is not a prefixed load. However, after the stack is set up, and register allocation has been done, the offset now is large enough that we would have to use a prefixed load instruction. The support for prefixed loads did not consider that patterns with a fused load and compare might have a prefixed address. Without this support, the proper prefixed load won't be generated. In the original code, when the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. When the pattern was created, ds_form_mem_operand recognized the insn as being valid since the offset was small. But after register allocation, ds_form_mem_operand did not return true. Because it didn't return true, the insn could not be split. Since the insn was not split and the prefix support did not indicate a prefixed instruction was used, the wrong load is generated. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the insn length will be 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes" for lwa. 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). 2023-06-23 Michael Meissner gcc/ PR target/105325 * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. (cherry picked from commit 370de1488a9a49956c47e5ec8c8f1489b4314a34) Co-Authored-By: Aaron Sawdey
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #17 from CVS Commits --- The master branch has been updated by Michael Meissner : https://gcc.gnu.org/g:370de1488a9a49956c47e5ec8c8f1489b4314a34 commit r14-2049-g370de1488a9a49956c47e5ec8c8f1489b4314a34 Author: Michael Meissner Date: Fri Jun 23 11:32:39 2023 -0400 Fix power10 fusion bug with prefixed loads, PR target/105325 This changes fixes PR target/105325. PR target/105325 is a bug where an invalid lwa instruction is generated due to power10 fusion of a load instruction to a GPR and an compare immediate instruction with the immediate being -1, 0, or 1. In some cases, when the load instruction is done, the GCC compiler would generate a load instruction with an offset that was too large to fit into the normal load instruction. In particular, loads from the stack might originally have a small offset, so that the load is not a prefixed load. However, after the stack is set up, and register allocation has been done, the offset now is large enough that we would have to use a prefixed load instruction. The support for prefixed loads did not consider that patterns with a fused load and compare might have a prefixed address. Without this support, the proper prefixed load won't be generated. In the original code, when the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. When the pattern was created, ds_form_mem_operand recognized the insn as being valid since the offset was small. But after register allocation, ds_form_mem_operand did not return true. Because it didn't return true, the insn could not be split. Since the insn was not split and the prefix support did not indicate a prefixed instruction was used, the wrong load is generated. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the insn length will be 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes" for lwa. 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). 2023-06-23 Michael Meissner gcc/ PR target/105325 * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. Co-Authored-By: Aaron Sawdey
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Peter Bergner changed: What|Removed |Added URL||https://gcc.gnu.org/piperma ||il/gcc-patches/2023-April/6 ||16805.html --- Comment #16 from Peter Bergner --- Another test case from Nick's dup bugzilla (PR108239): --- test.c --- // powerpc64le-linux-gnu-gcc -O2 -mcpu=power10 -mno-pcrel -c test.c #include static inline uint32_t readl(uint32_t *addr) { uint32_t ret; __asm__ __volatile__("lwz %0,%1" : "=r" (ret) : "m" (*addr)); return ret; } int test(void *addr) { return readl(addr + 0x8024); }
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Peter Bergner changed: What|Removed |Added CC||npiggin at gmail dot com --- Comment #15 from Peter Bergner --- *** Bug 108239 has been marked as a duplicate of this bug. ***
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Richard Biener changed: What|Removed |Added Target Milestone|13.0|13.2 --- Comment #14 from Richard Biener --- GCC 13.1 is being released, retargeting bugs to GCC 13.2.
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Peter Bergner changed: What|Removed |Added CC||bergner at gcc dot gnu.org Target Milestone|--- |13.0
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Michael Meissner changed: What|Removed |Added Assignee|acsawdey at gcc dot gnu.org|meissner at gcc dot gnu.org Status|NEW |ASSIGNED CC||meissner at gcc dot gnu.org --- Comment #13 from Michael Meissner --- Aaron is not working on GCC any longer, so I'm taking over this bug.
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 acsawdey at gcc dot gnu.org changed: What|Removed |Added CC||acsawdey at gcc dot gnu.org --- Comment #12 from acsawdey at gcc dot gnu.org --- I do have a patch for this one that has been sitting around that I forgot about, looking at reviving that to at least post.
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Segher Boessenkool changed: What|Removed |Added Priority|P3 |P1
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #11 from Segher Boessenkool --- It should use "YZ" as constraint (Y is DS-mode, Z is X-mode). The predicate should probably be lwa_operand ("lwau" does not exist, that's the irregularity this predicate is for).
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #10 from Kewen Lin --- (In reply to Jakub Jelinek from comment #9) > where it no longer satisfies the predicate but does satisfy the constraint. > It is unclear if there is any matching constraint for ds_form_mem_operand, > maybe wY? But not really sure about it. As the comments above wY, it's mainly for those VSX instructions and also checks no update, seems not perfect to be used here? The current ds_form_mem_operand predicate looks also contradicted with the below split condition address_is_non_pfx_d_or_x (XEXP (operands[1], 0), SImode, NON_PREFIXED_DS)). ds_form_mem_operand requires address_to_insn_form should always return INSN_FORM_DS, while address_is_non_pfx_d_or_x calls address_to_insn_form, it will never have the chance to return false since the address_to_insn_form will only return INSN_FORM_DS as predicate guards. The below snippet makes the split work and the failure gone. diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index b1fcc69bb60..a1b58dfa0c9 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1099,7 +1099,11 @@ (define_predicate "ds_form_mem_operand" rtx addr = XEXP (op, 0); - return address_to_insn_form (addr, mode, NON_PREFIXED_DS) == INSN_FORM_DS; + enum insn_form form = address_to_insn_form (addr, mode, NON_PREFIXED_DS); + + return form == INSN_FORM_DS + || (reload_completed && form == INSN_FORM_PREFIXED_NUMERIC); + }) ;; Return 1 if the operand, used inside a MEM, is a SYMBOL_REF. But as Jakub noted, I'm not sure reload can ensure to make the address satisfy this updated predicate under the unmatched constraint "m".
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||segher at gcc dot gnu.org --- Comment #9 from Jakub Jelinek --- I'd say the bug is that the various instructions that use ds_form_mem_operand predicate don't use a corresponding constraint. So, during combine: (insn 8 7 9 2 (parallel [ (set (reg:CC 120) (compare:CC (mem/c:SI (plus:DI (reg/f:DI 110 sfp) (const_int -12 [0xfff4])) [1 MEM[(struct Ath__array1D *)&m + 4B]._current+0 S4 A32]) (const_int 0 [0]))) (clobber (scratch:SI)) ]) "pr105325.C":11:30 2295 {*lwa_cmpdi_cr0_SI_clobber_CC_none} (nil)) is matched, as the offset is signed 16-bit that is a multiple of 4. But as it uses "m" constraint and LRA only cares about constraints, not predicates, it is reloaded as (insn 8 7 9 2 (parallel [ (set (reg:CC 100 0 [120]) (compare:CC (mem/c:SI (plus:DI (reg/f:DI 1 1) (const_int 40036 [0x9c64])) [1 MEM[(struct Ath__array1D *)&m + 4B]._current+0 S4 A32]) (const_int 0 [0]))) (clobber (reg:SI 9 9 [125])) ]) "pr105325.C":11:30 2295 {*lwa_cmpdi_cr0_SI_clobber_CC_none} (nil)) where it no longer satisfies the predicate but does satisfy the constraint. It is unclear if there is any matching constraint for ds_form_mem_operand, maybe wY? But not really sure about it.
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #8 from Martin Liška --- The problematic instructions is: lwa 9,40036(1) Diff in between power9 and power10: diff -u good.s bad.s --- good.s 2022-04-21 13:01:23.844042178 +0200 +++ bad.s 2022-04-21 13:01:28.544026646 +0200 @@ -1,5 +1,5 @@ .file "extmain.cpp.ii" - .machine power9 + .machine power10 .abiversion 2 .section".text" .section".toc","aw" ... .LCFI0: ld 9,.LC0@toc(2) + ld 10,0(9) + pstd 10,40040(1) li 10,0 - ori 10,10,0x9c68 - add 10,10,1 - ld 8,0(9) - std 8,0(10) - li 8,0 - li 9,0 - ori 9,9,0x9c64 - lwzx 9,9,1 - cmpwi 0,9,0 + lwa 9,40036(1) + cmpdi 0,9,0 beq 0,.L1 .L3: ...
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #7 from Martin Liška --- Reduced test-case: $ cat extmain.cpp.ii struct Ath__array1D { int _current; int getCnt() { return _current; } }; struct extMeasure { int _mapTable[1]; Ath__array1D _metRCTable; }; void measureRC() { extMeasure m; for (; m._metRCTable.getCnt();) for (;;) ; } $ powerpc64le-suse-linux-g++ extmain.cpp.ii -c -mcpu=power10 -O -fstack-protector-all /tmp/ccP8TsF3.s: Assembler messages: /tmp/ccP8TsF3.s:21: Error: operand out of range (0x9c64 is not between 0x8000 and 0x7ffc)
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Martin Liška changed: What|Removed |Added Status|WAITING |NEW --- Comment #6 from Martin Liška --- All right, reducing that right now..
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #5 from Kewen Lin --- I can't reproduce this either with trunk or latest gcc11 branch (with binutils 2.37), then I noticed that -O3 -mcpu=power10 isn't enough for the reproduction, it needs extra -fstack-protector-all. With -fstack-protector-all, I found both GCC11 and trunk will fail.
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #4 from Martin Liška --- Thanks. However, I cannot reproduce it with a cross compiler: $ powerpc64le-suse-linux-g++ -v ... gcc version 11.2.1 20220316 [revision 6a1150d1524aeda3381b2171712e1a6611d441d6] (SUSE Linux) Can you please reduce the test-case: https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 --- Comment #3 from Joel Stanley --- Created attachment 52843 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52843&action=edit assembly
[Bug target/105325] power10: Error: operand out of range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 Last reconfirmed||2022-04-21 CC||marxin at gcc dot gnu.org --- Comment #2 from Martin Liška --- Can you please share the assembly file (you can use --save-temps).