[Nouveau] Question on MME and Compute Subchannel in Kepler+
Hello everyone, I've been trying to adapt a switch emulator to emulate nouveau's compute. We've been told some things like indirect dispatch use the MME in Nouveau, however, looking at NVIDIA's open gpu documentation there's no MME in compute engine since Kepler. https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/compute/clb1c0.h MME for compute should still exist because of the presence of MME Shadow Memory Scratch registers. How does MME for Compute works on Kepler+ ? Does it use the 3D Subchannel? if so how do I know when its targeting the Compute Subchannel? ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions on Maxwell 2nd Gen Compute Kernels/Shaders
So we have been busy implementing the compute engine lately but we have discovered a few issues with Compute Shaders. I hope you guys can answer some questions. 1st How do I determine the size of Compute Shaders/Kernel Local Memory ? In Pipeline shaders the size is included in the header but Compute Kernels don't have a header, so how do I determine how much local memory it uses? In case I can't is there a limit? 2nd I backtrack directions for LDG from the constbuffer that stores them. I then use this directions then to compute the adress in my emulated SSBO. For fragment, geometry and vertex shaders I got no problems with this directions. For compute shaders the directions seem to be invalid, I imagine there's a base adress that's added to this directions. Where can I obtain that base adress? 3rd SUATOM instraction CAS is similar to CompareAndSwap except it may add 1 or 2 to the data register on store. How do I know when it adds 1 or 2? Thanks in advance. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions on Falcon Command Processor
So now I'm to looking to implement NVDec and as far as I know the game submits a series of commands to the service. This commands are processed by Falcon and then it does its magic. Do you guys got any RE on Falcon commands and how they execute different workloads ? ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Question on Conditional Rendering Maxwell/Pascal
So we are currently doing tests and complying with them in our Emulator. Currently the conditional rendering test does not pass (no wonder we not even implement it). I've been looking at the current documentation https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L796 So far I don't understand how the cond address is used and to what it's compared. https://github.com/envytools/envytools/blob/0d91b8bcef3ceb47ff0b114025d301edb790d472/rnndb/g80_defs.xml#L61 It says it uses 2 queries, how do I know which query it's talking off and what it's the comparison that should be done. Also, does failing conditional rendering makes registers not write into the engine or just drawcalls/claers are ignored? ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Question on interoperability with Nouveau
Hi guys again. A homebrew developer (homebrew is custom software made for the switch using openGL under nouveau) reported to me that 'glGenerateMipmap' wasn't working on yuzu (Nintendo Switch emulator). I looked into it and I noticed all the triangle data used by nouveau to render the mipmaps was all zeroed out, meaning that probably we don't implement the mechanism you guys use to upload that data. How can I track this in your code and know what you guys use to upload the triangles data into gpu memory ? ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions on syncing mechanisms
So I have been implementing syncing mechanisms to yuzu's switch emulator, aka Tegra X1 emulation and I already have: Semaphores, Syncpoints and Queries to some extent. I'm missing the barriers (GPU waits for CPU): I got this from RE: Barrier mode has priority (from highest to lowest): 1) 0x08 sets needsWfi=0 -> highest priority, does puller refcnt + split(0,0) + 0x100 NoOperation + rest of cmds + split(1,1) 2) 0x01 sets needsWfi=0 -> uses 0x0110 Serialize/"WaitForIdle" 3) 0x02 sets needsWfi=1 -> uses 0xDE0 (??) 4) 0x04 sets needsWfi=1 -> uses 0xF7C (??) <-- tile related? Used by nvnQueue_ctrlTiledDownsample 5) nothing; sets needsWfi=0 Do you guys know any info on this and how the GPU must behave in each situation. For instance, on Serialize, what should the GPU be waiting for? I do know you guys use that one not sure on the rest. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Questions on GPU syncpoint handling from inside the GPU
bump El mar., 2 abr. 2019 a las 11:11, Fernando Sahmkow () escribió: > Hi guys how are you doing? I have some questions on how the GPU handles > syncpoints from the commandlist. > > I do know the register 0xB2 is the one written in the Maxwell3D Engine. As > far as I know bits 0:15 are the syncpoint id, bit 16 is unknown for me, bit > 20 is increment? What other bits are set and what should be done on > increment? > > Thanks in advance. > -- Atentamente, *Fernando A. Sahmkow* *Móvil*: +584242280286 *Correo*: fsahmko...@gmail.com ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions on GPU syncpoint handling from inside the GPU
Hi guys how are you doing? I have some questions on how the GPU handles syncpoints from the commandlist. I do know the register 0xB2 is the one written in the Maxwell3D Engine. As far as I know bits 0:15 are the syncpoint id, bit 16 is unknown for me, bit 20 is increment? What other bits are set and what should be done on increment? Thanks in advance. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Render Targets and Pitch Linear Textures in Maxwell/Pascal Question
So I have been going on over the documentation trying to figure out the exact layout of Pitch Linear Textures and find some missing values. First Question: What's the correct layout of pitch linear textures in memory? Is padding of the pitch added at start or at the end? Do they have some kind of header? Currently I see them as a normal texture matrix with just pitch at the end of each row but If I send the game this values, they go into an ternal loop. Second Question: Pitch Value in Render Targets. In the registers in https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L282 the pitch does not appear anywhere, where is it stored for Render Targets. Thanks in advance and I hope you guys can provide me some info. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Question on IPA on GM107
So I'm trying to track an special value in IPA instruction generation. https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp#L2561 Register on 0x14 (20) is set to some source on "insn->op == OP_PINTERP" I have found while emulation that such value can be set sometimes to FragCoord.w, I don't however know what that value is and how to represent it on glsl. Do you guys know where does that value come from and what it means? -- Atentamente, *Fernando A. Sahmkow* *Móvil*: +584242280286 *Correo*: fsahmko...@gmail.com ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions on Blocklinear Mipmaps and auto-sizing
I'm currently implementing mipmaps but I have a set of troubles guessing the block height and block depth of them. According to https://envytools.readthedocs.io/en/latest/hw/memory/g80-surface.html#textures-mipmapping-and-arrays the texture unit auto resizes mipmaps' blocks but how do I know how many blocks each one uses? I'm currently using this algorithm: u32 height = MipHeight(mip_level); u32 gobs_in_y = (height + 7) / 8; u32 bh = block_height; // Magical block resizing algorithm, needs more testing. while (bh > 1 && (gobs_in_y + bh - 1) / bh <= 2) { bh >>= 1; } return bh; it works 95% of the time but doesn't fit the correct block size all the time. Do you guys got any info on the correct algorithm used? ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Question on Render Targets Register: Array Mode
So there's a register in Render Targets called Array Mode: https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L289 We've witnessed values of 1 and 6 (array mode -> layers) but we can't tell their meaning. Do you guys got any related info? Thanks in advance. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Textures Twiddling/Swizzling
Thanks for the last info it was truely helpful. Anyways, I'm currently trying to implement 3D textures into yuzu, as far as I know they are twiddled in a different manner to 2D textures. Could one of you guys point me in the right direction? I've been meddling around: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv50/nv50_tex.c but I can't see where the swizzling actualy takes place. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions on Maxwell/Pascal Texture Instructions Modes
Hello, I got some doubts on how texture modes work on TEX, TEXS, TLD4, etc instructions. I got: DC, AOFFI, NDV, NODEP, MZ, PTP modes as well as LZ Mode. How does this work or change the behavior of the texture instruction. So far of those I know AOFFI defines an Offset but I'm on blanks for the rest. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Questions Conscerning Pascal ISA and IPA Instruction
I'm currently running tests to document PASCAL's ISA on Cuda but I've come to a dead end with the IPA instruction. I've tried searching around Nouveau's codebase for clues but I've fallen short from it. Could someone reference me in the right direction? -- Atentamente, *Fernando A. Sahmkow* *Correo*: fsahmko...@gmail.com ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau