Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
On Wed, 28 Feb 2024 02:24:40 PST (-0800), Robin Dapp wrote: I suggest specify -fno-schedule-insns to force tests assembler never change for any scheduling model. We already do that and that's the point - as I mentioned before, no scheduling is worse than default scheduling here (for some definition of worse). The way to reduce the number of vsetvls is to set the load latency to a low value. I think -fno-schedule-insns is a perfectly reasonable way to get rid of the test failures in the short term. Using -fno-schedule-insns doesn't really fix the core fragility of the tests, though: what the pass does depends very much on the order of instructions it sees, so anything that reorders RTL is going to cause churn in the tests. Sure getting rid of scheduling will get rid of a big cause for reordering, but any pass could reorder RTL and thus change the expected vsetvl counts. Maybe the right thing to do here is to rewrite these as RTL tests? That way we can very tightly control the input ordering. It's kind of the opposite of Jeff's suggestion to add more debug output to the pass, but I think that wouldn't actually solve the issue: we're not having trouble matching assembly, the fragility comes from the input side. That might be a "grass is always greener" thing, though, as I don't think I've managed to write a useful RTL test yet... Regards Robin
Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
> I suggest specify -fno-schedule-insns to force tests assembler never > change for any scheduling model. We already do that and that's the point - as I mentioned before, no scheduling is worse than default scheduling here (for some definition of worse). The way to reduce the number of vsetvls is to set the load latency to a low value. Regards Robin
Re: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
I suggest specify -fno-schedule-insns to force tests assembler never change for any scheduling model. juzhe.zh...@rivai.ai From: Palmer Dabbelt Date: 2024-02-28 08:55 To: jeffreyalaw CC: juzhe.zhong; Robin Dapp; ewlu; gcc-patches; gnu-toolchain; pan2.li Subject: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change On Tue, 27 Feb 2024 15:53:19 PST (-0800), jeffreya...@gmail.com wrote: > > > On 2/27/24 15:56, 钟居哲 wrote: >> >> I don't think it's that simple. On some uarchs vsetvls are nearly free >>>>while on others they can be fairly expensive. It's not clear (to me) >>>>yet if one approach or the other is going to be the more common. >> >> That's uarch dependent which is not the stuff I am talking about. >> What's I want to say is that this patch breaks those testcases I added >> for VSETVL PASS testing. >> And those testcases are uarch independent. > No, uarch impacts things like latency, which in turn impacts scheduling, > which in turn impacts vsetvl generation/optimization. Ya, and I think that's just what's expected for this sort of approach. Edwin and I were working through that possibility in the office earlier, but we didn't have the code up. So I figured I'd just go through one in more detail to see if what we were talking about was sane. Grabbing some arbitrary function in the changed set: void test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) { vbool1_t v1 = *(vbool1_t*)in; vbool64_t v2 = *(vbool64_t*)in; *(vbool1_t*)(out + 100) = v1; *(vbool64_t*)(out + 200) = v2; } we currently get (from generic-ooo) test_vbool1_then_vbool64: vsetvli a4,zero,e8,m8,ta,ma vlm.v v2,0(a0) vsetvli a5,zero,e8,mf8,ta,ma vlm.v v1,0(a0) addia3,a1,100 vsetvli a4,zero,e8,m8,ta,ma addia1,a1,200 vsm.v v2,0(a3) vsetvli a5,zero,e8,mf8,ta,ma vsm.v v1,0(a1) ret but we could generate correct code with 2, 3, or 4 vsetvli instructions depending on how things are scheduled. For example, with -fno-schedule-insns I happen to get 3 test_vbool1_then_vbool64: vsetvli a5,zero,e8,mf8,ta,ma vlm.v v1,0(a0) vsetvli a4,zero,e8,m8,ta,ma vlm.v v2,0(a0) addia3,a1,100 addia1,a1,200 vsm.v v2,0(a3) vsetvli a5,zero,e8,mf8,ta,ma vsm.v v1,0(a1) ret because the load/store with the same vcfg end up scheduled back-to-back. I don't see any reason why something along the lines of test_vbool1_then_vbool64: vsetvli a4,zero,e8,m8,ta,ma vlm.v v2,0(a0) addia3,a1,100 vsm.v v2,0(a3) vsetvli a5,zero,e8,mf8,ta,ma vlm.v v1,0(a0) addia1,a1,200 vsm.v v1,0(a1) ret wouldn't be correct (though I just reordered the loads/stores and then removed the redundant vsetvlis, so I might have some address calculation wrong in there). The validity of removing a vsetvli depends on how the dependant instructions get scheduled, which is very much under the control of the pipeline model -- it's entirely possible the code with more vsetvlis is faster, if vsetvli is cheap and scheduling ends up hiding latency better. So IMO it's completely reasonable to have vsetvli count ranges for a test like this. I haven't looked at the others in any detail, but I remember seeing similar things elsewhere last time I was poking around these tests. We should probably double-check all these and write some comments, just to make sure we're not missing any bugs, but I'd bet there's a bunch of valid testsuite changes. Like we talked about in the call this morning we should probably make the tests more precise, but that's a larger effort. After working through this I'm thinking it's a bit higher priority, though, as in this case the bounds are so wide we're not even really testing the pass any more. > > jeff
Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
On Tue, 27 Feb 2024 15:53:19 PST (-0800), jeffreya...@gmail.com wrote: On 2/27/24 15:56, 钟居哲 wrote: >> I don't think it's that simple. On some uarchs vsetvls are nearly free while on others they can be fairly expensive. It's not clear (to me) yet if one approach or the other is going to be the more common. That's uarch dependent which is not the stuff I am talking about. What's I want to say is that this patch breaks those testcases I added for VSETVL PASS testing. And those testcases are uarch independent. No, uarch impacts things like latency, which in turn impacts scheduling, which in turn impacts vsetvl generation/optimization. Ya, and I think that's just what's expected for this sort of approach. Edwin and I were working through that possibility in the office earlier, but we didn't have the code up. So I figured I'd just go through one in more detail to see if what we were talking about was sane. Grabbing some arbitrary function in the changed set: void test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) { vbool1_t v1 = *(vbool1_t*)in; vbool64_t v2 = *(vbool64_t*)in; *(vbool1_t*)(out + 100) = v1; *(vbool64_t*)(out + 200) = v2; } we currently get (from generic-ooo) test_vbool1_then_vbool64: vsetvli a4,zero,e8,m8,ta,ma vlm.v v2,0(a0) vsetvli a5,zero,e8,mf8,ta,ma vlm.v v1,0(a0) addia3,a1,100 vsetvli a4,zero,e8,m8,ta,ma addia1,a1,200 vsm.v v2,0(a3) vsetvli a5,zero,e8,mf8,ta,ma vsm.v v1,0(a1) ret but we could generate correct code with 2, 3, or 4 vsetvli instructions depending on how things are scheduled. For example, with -fno-schedule-insns I happen to get 3 test_vbool1_then_vbool64: vsetvli a5,zero,e8,mf8,ta,ma vlm.v v1,0(a0) vsetvli a4,zero,e8,m8,ta,ma vlm.v v2,0(a0) addia3,a1,100 addia1,a1,200 vsm.v v2,0(a3) vsetvli a5,zero,e8,mf8,ta,ma vsm.v v1,0(a1) ret because the load/store with the same vcfg end up scheduled back-to-back. I don't see any reason why something along the lines of test_vbool1_then_vbool64: vsetvli a4,zero,e8,m8,ta,ma vlm.v v2,0(a0) addia3,a1,100 vsm.v v2,0(a3) vsetvli a5,zero,e8,mf8,ta,ma vlm.v v1,0(a0) addia1,a1,200 vsm.v v1,0(a1) ret wouldn't be correct (though I just reordered the loads/stores and then removed the redundant vsetvlis, so I might have some address calculation wrong in there). The validity of removing a vsetvli depends on how the dependant instructions get scheduled, which is very much under the control of the pipeline model -- it's entirely possible the code with more vsetvlis is faster, if vsetvli is cheap and scheduling ends up hiding latency better. So IMO it's completely reasonable to have vsetvli count ranges for a test like this. I haven't looked at the others in any detail, but I remember seeing similar things elsewhere last time I was poking around these tests. We should probably double-check all these and write some comments, just to make sure we're not missing any bugs, but I'd bet there's a bunch of valid testsuite changes. Like we talked about in the call this morning we should probably make the tests more precise, but that's a larger effort. After working through this I'm thinking it's a bit higher priority, though, as in this case the bounds are so wide we're not even really testing the pass any more. jeff
Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
On 2/27/24 15:56, 钟居哲 wrote: >> I don't think it's that simple. On some uarchs vsetvls are nearly free while on others they can be fairly expensive. It's not clear (to me) yet if one approach or the other is going to be the more common. That's uarch dependent which is not the stuff I am talking about. What's I want to say is that this patch breaks those testcases I added for VSETVL PASS testing. And those testcases are uarch independent. No, uarch impacts things like latency, which in turn impacts scheduling, which in turn impacts vsetvl generation/optimization. jeff
Re: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
>> I don't think it's that simple. On some uarchs vsetvls are nearly free >>while on others they can be fairly expensive. It's not clear (to me) >>yet if one approach or the other is going to be the more common. That's uarch dependent which is not the stuff I am talking about. What's I want to say is that this patch breaks those testcases I added for VSETVL PASS testing. And those testcases are uarch independent. juzhe.zh...@rivai.ai From: Jeff Law Date: 2024-02-27 23:22 To: juzhe.zh...@rivai.ai; Robin Dapp; Edwin Lu; gcc-patches CC: gnu-toolchain; pan2.li Subject: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change On 2/26/24 18:21, juzhe.zh...@rivai.ai wrote: > If the scheduling model increases the vsetvls, we shouldn't set it as > default scheduling model I don't think it's that simple. On some uarchs vsetvls are nearly free while on others they can be fairly expensive. It's not clear (to me) yet if one approach or the other is going to be the more common. jeff
Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
On 2/26/24 18:21, juzhe.zh...@rivai.ai wrote: If the scheduling model increases the vsetvls, we shouldn't set it as default scheduling model I don't think it's that simple. On some uarchs vsetvls are nearly free while on others they can be fairly expensive. It's not clear (to me) yet if one approach or the other is going to be the more common. jeff
Re: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
If the scheduling model increases the vsetvls, we shouldn't set it as default scheduling model juzhe.zh...@rivai.ai From: Robin Dapp Date: 2024-02-26 21:29 To: Edwin Lu; gcc-patches CC: rdapp.gcc; gnu-toolchain; pan2.li; juzhe.zh...@rivai.ai Subject: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change On 2/24/24 00:10, Edwin Lu wrote: > Given the recent change with adding the scheduler pipeline descriptions, > many scan-dump failures emerged. Relax the expected assembler output > conditions on the affected tests to reduce noise. I'm not entirely sure yet about relaxing the scans like this. There seem to be uarchs that want to minimize vsetvls under all circumstances while others don't seem to care all that much. We could (not must) assume that the tests that now regress have been written with this minimization aspect in mind and that we'd want to be sure that we still manage to emit the minimal number of vsetvls. Why is the new upper bound acceptable? What if a vector_load cost of 12 (or so) causes even more vsetvls? The 6 in generic_ooo is more or less arbitrary chosen. My suggestion before was to create another sched model that has load costs like before and run the regressing tests with that model. That's of course also not really ideal and actually shoehorned a bit, in particular as no scheduling also increases the number of vsetvls. Juzhe: What's your intention with those tests? I'd suppose you want the vsetvl number to be minimal here and not higher? Did you plan to add a particular scheduling model or are you happy with the default (all 1) latencies? Regards Robin
Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change
On 2/24/24 00:10, Edwin Lu wrote: > Given the recent change with adding the scheduler pipeline descriptions, > many scan-dump failures emerged. Relax the expected assembler output > conditions on the affected tests to reduce noise. I'm not entirely sure yet about relaxing the scans like this. There seem to be uarchs that want to minimize vsetvls under all circumstances while others don't seem to care all that much. We could (not must) assume that the tests that now regress have been written with this minimization aspect in mind and that we'd want to be sure that we still manage to emit the minimal number of vsetvls. Why is the new upper bound acceptable? What if a vector_load cost of 12 (or so) causes even more vsetvls? The 6 in generic_ooo is more or less arbitrary chosen. My suggestion before was to create another sched model that has load costs like before and run the regressing tests with that model. That's of course also not really ideal and actually shoehorned a bit, in particular as no scheduling also increases the number of vsetvls. Juzhe: What's your intention with those tests? I'd suppose you want the vsetvl number to be minimal here and not higher? Did you plan to add a particular scheduling model or are you happy with the default (all 1) latencies? Regards Robin