On 6/30/23 00:45, Mike Pattrick wrote: > On Thu, Jun 29, 2023 at 3:10 PM Ilya Maximets <i.maxim...@ovn.org> wrote: >> >> On 6/9/23 17:05, Mike Pattrick wrote: >>> Several xlate actions used in recursive translation currently store a >>> large amount of information on the stack. This can result in handler >>> threads quickly running out of stack space despite before >>> xlate_resubmit_resource_check() is able to terminate translation. This >>> patch reduces stack usage by over 3kb from several translation actions. >> >> I'm assuming that this number is mostly about patch_port_output function, >> right? I see only ~30% stack usage reduction for the main recursive >> do_xlate_actions function. It went from 720 to 512 bytes. >> >> Or am I calculating it wrong? I'm using -fstack-size compiler option. > > I was using GDB for this, finding the difference in the stack pointer > between calls to the same function during recursion. The intention of > this patch is to get to xlate_resubmit_resource_check() without > smashing the stack, no individual functions stack usage is as > important as the holistic stack usage while recursing. > > For example, without patch: > > (gdb) fram 1 > #1 0x000000000044dee3 in xlate_table_action (...) at > ofproto/ofproto-dpif-xlate.c:4539 > 4539 if (xlate_resubmit_resource_check(ctx)) { > (gdb) p $rsp > $4 = (void *) 0x7f888e7aa700 > (gdb) frame 8 > #8 xlate_table_action (...) at ofproto/ofproto-dpif-xlate.c:4584 > 4584 xlate_recursively(ctx, rule, table_id <= old_table_id, > (gdb) p $rsp > $5 = (void *) 0x7f888e7aba70 > (gdb) p 0x7f888e7aba70-0x7f888e7aa700 > $6 = 4976 > > With patch: > > #1 0x000000000044e3bc in xlate_table_action (...) at > ofproto/ofproto-dpif-xlate.c:4552 > 4552 if (xlate_resubmit_resource_check(ctx)) { > (gdb) p $rsp > $4 = (void *) 0x7f78e69a9860 > (gdb) frame 9 > #9 0x000000000044ec00 in xlate_table_action (...) at > ofproto/ofproto-dpif-xlate.c:7212 > 7212 xlate_table_action(ctx, ctx->xin->flow.in_port.ofp_port, > (gdb) p $rsp > $5 = (void *) 0x7f78e69a9c60 > (gdb) p 0x7f78e69a9c60-0x7f78e69a9860 > $6 = 1024 > > >> >>> This patch also moves some trace function from do_xlate_actions into its >>> own function to reduce stack usage. >> >> This doesn't seem to work. With GCC and -O2, i.e. the standard optimization >> level, this new function is always inlined for me, and there is no real >> difference in the produced code and the stack usage. > > When I compile with O2, I get the following code generation: > > Without patch: > 000000000044cda0 <do_xlate_actions>: > [...] > 44cdad: 48 81 ec 78 02 00 00 sub rsp,0x278 > > With patch: > 000000000044cff0 <do_xlate_actions>: > [...] > 44cffd: 48 81 ec 98 01 00 00 sub rsp,0x198 > > Do you get different results?
If we're just looking at the first sub in the function, I get the same number (0x278) on current master and 0x148 with your patch applied. Moving the port name map to heap reduces the value to 0x128. The xlate_trace function is fully inlined according to objdump. > >> >> In order to achieve actual reduction of stack usage there we need to move >> to dynamic allocation of the port name map.q >> >> There are few more functions that are fully or partially inlined into >> do_xlate_actions with default compiler options: >> >> - xlate_output_reg_action: Allocates a large mf_subvalue on stack. >> - xlate_sample_action: Allocates huge struct flow_tnl(!) on stack. >> - compose_conntrack_action: Allocates mf_subvalue. >> - xlate_check_pkt_larger: Allocates mf_subvalue as well. >> >> With these moved to heap, it should be possible to reduce the stack usage >> by about 60% for do_xlate_actions function. > > These don't inline for me on O2. What GCC version are you using? If I do changes in the functions above, the number is dropped to 0xd8. I'm building with gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2) on F38. CFLAGS="-g -O2". Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev