Richard Biener <[email protected]> writes:
> On Wed, 22 Oct 2025, Richard Sandiford wrote:
>
>> Richard Biener <[email protected]> writes:
>> > The following tries to addess one shortcoming of the SLP permute
>> > optimization phase which is that it tries to refuse layouts that
>> > cannot revert to layout 0 on an edge even if layout 0 isn't actually
>> > valid.  This hurts in particular on VLA vector targets where many
>> > permutes cannot be code generated.  While for fixed-length targets
>> > costs eventually sort things out in a way to chose a "no permutation"
>> > setting the above check prevents VLA targets from arriving there.
>> 
>> I'm not sure this is the right way to go.  The subpass is supposed
>> to be an optimisation pass rather than a legitimisation/"make something
>> vectorisable" pass.  The cost function is purely aimed at reducing
>> permutations rather than at turning unvectorisable graphs into
>> vectorisation ones.  If the pass happens to make something vectorisable,
>> that happens by chance rather than by design.
>
> Note while the change turns something not vectorizable into something
> vectorizable it does so by enabling permute optimization to see that
> a load permutation can be fully elided.  This currently does not work
> because of the imposed "the reverse permute has to be code generatable"
> check(s).

Yeah.  But the purpose of the pass is to optimise permutations in
cases where the original form is vectorisable, rather than to search
for a vectorisable form.

> The existing handling of [in_layout_i == 0 &&] out_layout_i == 0 for
> a not code generatable permute made me think that the fact that
> layout 0 isn't code generatable shouldn't prevent us from arriving
> at the optimal (just from a cost perspective) layout configuration.
>
> To this extent, do you agree with the proposed change to
> internal_node_cost to anticipate redundant permute eliding?

As far as that change goes, I think it would more convincing to have
a motivating case in which the pass wasn't being used to turn
non-vectorisable code into vectorisable code.

> One could argue that vect_transform_slp_perm_load_1 should
> not fail for those - in this case it fails for
> group_size == dr_group_size == 16 but nunits [4,4], so
> a !repeating_p permute.
>
>> I can see that a similar type of pass might be useful for pushing
>> permutations around until the graph is vectorisable.  But I think
>> that should be a separate subpass that runs first, with a different goal
>> function.  Perhaps it would share enough code with the optimisation pass
>> that they could both be subclasses of a common base class (not sure).
>
> I'll note that at least for permutes on loads we're in the difficult
> situation that permute lowering only handles 
> interleaving patterns right now and other permutes are 
> not lowered but left to vectorizable_load.  Trying to legitimize
> those earlier (anticipating what vectorizable_load would do) would
> be nice but also necessarily dependent on VF (permute lowering also
> looks at VF and at a point where it's not yet final ...)
>
> But as you say, legitimizing should be separate from optimizing but
> the issue at hand is that optimizing on a not legitimized graph
> ties its hands because of the reverse permute requirement ...

Yeah, but like I say, I think the legitimising should happen first.
The hand-tying wouldn't matter then: the optimisation pass would start
with a vectorisable graph (as far as permutes go) and would have an
opportunity to improve it.

I think the general legitimisation problem isn't just about eliding.
It's about pushing permutations around until a new (possibly non-empty)
set of permutations work.  And since it's a question of legitimisation,
we should do that even if it involves "pessimising" hotter blocks to make
colder blocks work, which is something that the optimisation pass tries
to avoid (when optimising for speed).

>> > The patch fixes up internal_node_cost to anticipate redundant permute
>> > eliding and short-cuts the above requirement on edges to partitions
>> > that do not have a prefered layout with the idea we shouldn't have
>> > the need to revert to layout zero for those.
>> >
>> > The downside is that we now can arrive at invalid layout configurations
>> > which we deal with simply giving up.
>> 
>> Right.  I think that for other testcases that are currently successfully
>> optimised, and that are vectorisable with and without the optimisation,
>> we might lose optimisations by doing this, because the pass could get
>> stuck in a dead end.  The risk is probably higher for VLA testcases,
>> due to the same property that motivates this patch.
>> 
>> The change therefore feels like a bit of a hack to me.
>
> It probably is, esp. singling out that -1 layout.  The question is
> what the alternative is?

Running a new legitimisation pass just before the optimisation pass
seems like the way to go to me, but I realise it's not a quick fix.

>
> What's the purpose of rejecting the layout at
>
>                     /* Reject the layout if it would make layout 0 
> impossible
>                        for later partitions.  This amounts to testing that 
> the
>                        target supports reversing the layout change on 
> edges
>                        to later partitions.
>           
>                        In principle, it might be possible to push a layout
>                        change all the way down a graph, so that it never
>                        needs to be reversed and so that the target doesn't
>                        need to support the reverse operation.  But it 
> would
>                        be awkward to bail out if we hit a partition that
>                        does not support the new layout, especially since
>                        we are not dealing with a lattice.  */
>                     is_possible &= edge_layout_cost (ud, other_node_i, 0,
>                                                      layout_i).is_possible 
> ();
>
> ?  It seems we want to enable the backward pass to undo everything the
> forward pass did, because that might have gone "too far"?  But we
> do not check that we can "undo", aka go back to the initial chosen
> layout (which is not always 0 but matches the reverse load permutation?)
> My alternative idea was to not require the ability to go back if
> layout 0 is "impossible" in terms of internal_node_cost?

The assumption is that layout 0 is the conservative choice.  Of course,
that isn't true if the original layout isn't vectorisable, but like
I say, it's not IMO the pass's job to handle that case.

Thanks,
Richard

Reply via email to