Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. [1] http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html [2] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html [3] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html [4] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html --Jason On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. And no one is using it that way. All of the metadata checks you are deleting are checks on the *destination*. Didn't you write this code yourself? The only use for the collected metadata is initializing the instruction flags of the MOVs subsequent LOAD_PAYLOAD instructions are lowered to, based on the metadata already collected for its source registers, which can never be MRFs, so the metadata you collect from MRF writes is never actually used. Right... I misred something initially. Yes, we should never be tracking MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look a bit later, but it looks better. pgpjBO09kHb1i.pgp Description: PGP signature
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. Regarding the other suggestion of just requiring width == exec_size for immediates and uniforms, that seems pretty reasonable to me. I'd like to know what it will do to optimizations, but it seems ok initially. I'm still a bigger fan of just subclassing and stashing everything we need to know up-front. If we do it right, the only things that will actually need to know about the subclass are the function for creating a LOAD_PAYLOAD and lower_load_payloads. --Jason [1] http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html [2] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html [3] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html [4] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html --Jason On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. A reasonable compromise might be to leave it up to the caller whether to set the force_writemask_all and force_sechalf flags or not. I.e. just copy the same flags set on the LOAD_PAYLOAD instruction to the MOV instructions. That would still allow reducing the liveness intervals in cases where we know that the payload respects channel boundaries. Tracking the flag information per-register in cases where the same payload has well- and ill-behaved values seems rather premature and rather useless to me because the optimizer is likely to be able to get rid of redundant copies with force_writemask_all -- register coalesce is doing this already AFAIK, maybe by accident. Regarding the other suggestion of just requiring width == exec_size for immediates and uniforms, that seems pretty reasonable to me. I'd like to know what it will do to optimizations, but it seems ok initially. I'm still a bigger fan of just subclassing and stashing everything we need to know up-front. If we do it right, the only things that will actually need to know about the subclass are the function for creating a LOAD_PAYLOAD and lower_load_payloads. --Jason [1] http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html [2] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html [3] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html [4]
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber correctly). We'd rather not have to squash everything if we can. All that being said, our liveness/register allocation can't handle this and making register allocation handle parts of registers interfering but not other bits is going to be a real pain. Maybe not even worth it. If you'd rather force_writemask_all everything, I'll sign off on that for now. I just wanted to point out that it may not be a good permanent solution. A reasonable compromise might be to leave it up to the caller whether to set the force_writemask_all and force_sechalf flags or not. I.e. just copy the same flags set on the LOAD_PAYLOAD instruction to the MOV instructions. That would still allow reducing the liveness intervals in cases where we know that the payload respects channel boundaries. Tracking the flag information per-register in cases where the same payload has well- and ill-behaved values seems rather premature and rather useless to me because the optimizer is likely to be able to get rid of redundant copies with force_writemask_all -- register coalesce is
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
One more comment here... This particularly regards your plan of separating it into things that match the destination and other things and not copy prop uniforms or immediates into the other things. There is another case we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0, r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16 load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On those platforms, there is a special type of MOV-to-MRF that can move from (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right now. However, it doesn't follow the standard rules. --Jason On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber correctly). We'd rather not have to squash everything if we can. All that being said, our liveness/register allocation can't handle this and making register allocation handle parts of registers interfering but not other bits is going to be a real pain.
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber correctly). We'd rather not have to squash everything if we can. All that being said, our liveness/register allocation can't handle this and making register allocation handle parts of registers interfering but not other bits is going to be a real pain. Maybe not even worth it. If you'd rather force_writemask_all everything, I'll sign off on that for now. I just wanted to point out that it may not be a good permanent solution. A reasonable compromise might be to leave it up to the caller whether to set the force_writemask_all and force_sechalf flags or not. I.e. just copy the same flags set on the LOAD_PAYLOAD instruction to the MOV instructions. That would still allow reducing the liveness intervals in cases where we know that the payload respects channel boundaries. Tracking the flag information per-register in cases where the same payload has well- and ill-behaved values seems rather premature and rather useless to me because the optimizer is likely to be able to get rid of redundant
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: One more comment here... This particularly regards your plan of separating it into things that match the destination and other things and not copy prop uniforms or immediates into the other things. There is another case we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0, r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16 load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On those platforms, there is a special type of MOV-to-MRF that can move from (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right now. However, it doesn't follow the standard rules. --Jason I don't see why it couldn't be handled in roughly the same way you do it now? Recognize when src[i + 4] is the same 8-wide register as src[i] shifted by 8 and emit a COMPR4 copy in that case? On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 20, 2015 at 2:43 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: One more comment here... This particularly regards your plan of separating it into things that match the destination and other things and not copy prop uniforms or immediates into the other things. There is another case we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0, r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16 load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On those platforms, there is a special type of MOV-to-MRF that can move from (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right now. However, it doesn't follow the standard rules. --Jason I don't see why it couldn't be handled in roughly the same way you do it now? Recognize when src[i + 4] is the same 8-wide register as src[i] shifted by 8 and emit a COMPR4 copy in that case? Sure. I haven't thought about it hard enough to prove it can't be done. Just wanted to make sure you were aware of that particular monkey-wrench. --Jason On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpfF5PKyva7v.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. And no one is using it that way. All of the metadata checks you are deleting are checks on the *destination*. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. And no one is using it that way. All of the metadata checks you are deleting are checks on the *destination*. Didn't you write this code yourself? The only use for the collected metadata is initializing the instruction flags of the MOVs subsequent LOAD_PAYLOAD instructions are lowered to, based on the metadata already collected for its source registers, which can never be MRFs, so the metadata you collect from MRF writes is never actually used. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpznbosO3BD6.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. And no one is using it that way. All of the metadata checks you are deleting are checks on the *destination*. Didn't you write this code yourself? The only use for the collected metadata is initializing the instruction flags of the MOVs subsequent LOAD_PAYLOAD instructions are lowered to, based on the metadata already collected for its source registers, which can never be MRFs, so the metadata you collect from MRF writes is never actually used. Right... I misred something initially. Yes, we should never be tracking MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look a bit later, but it looks better. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. --Jason On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. And no one is using it that way. All of the metadata checks you are deleting are checks on the *destination*. Didn't you write this code yourself? The only use for the collected metadata is initializing the instruction flags of the MOVs subsequent LOAD_PAYLOAD instructions are lowered to, based on the metadata already collected for its source registers, which can never be MRFs, so the metadata you collect from MRF writes is never actually used. Right... I misred something initially. Yes, we should never be tracking MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look a bit later, but it looks better. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). pgpTS_XOX7QSF.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev