Re: [cedar] DMA ring test timeout [solved]

2022-05-06 Thread Amol
On 06/05/2022, Amol  wrote:
> Hello,
>
> While trying to program the HD 7350 Cedar GPU to run with DPM
> under the 157MHz/200MHz sclk/mclk powerstate, for single_display,
> and with forced LOW performance on the SMC, the DMA ring seems
> to hang.
>
. . .
. . .
>
> Does this mean that the GPU doesn't support running DMA ring at the
> lowest perf profile (157Mhz/200MHz)? I do still believe that this
> situation might be a result of faulty/missing programming on my part,
> though I am not sure what exactly it is that is at fault or is missing.

The mc_reg_table was being populated with invalid entries.

Thanks,
Amol


[cedar] DMA ring test timeout

2022-05-05 Thread Amol
Hello,

While trying to program the HD 7350 Cedar GPU to run with DPM
under the 157MHz/200MHz sclk/mclk powerstate, for single_display,
and with forced LOW performance on the SMC, the DMA ring seems
to hang.

After the desired power state is programmed, the DMA and CP rings
0xcafedead tests are run. The CP ring test succeeds but the DMA ring
test times out. Note that the Linux radeon driver does not wait so late
during its initialization to run these tests.

The GPU's DMA ring RPTR is found to be at index 3 (it should be at
index 4 after consuming all 4 32-bit words, when starting at index 0).
Since the write-back of GPU's RPTR is successful, the DMA from
GPU to system RAM works.

Contents of some registers, before and after running the DMA test:

DMA_STATUS: 0x44c83d57, 0x44c83156 (IDLE bit is off in the after
status)
GRBM_STATUS: 0x3828, 0x3828
SRBM_STATUS: 0x20c0, 0x20c0

If the DMA WRITE(2) cmd is replaced with a TRAP(7), the DMA
RPTR does not even move a single step - after the timeout, it is
found to be still at 0. And the IDLE status is found to be OFF.
The expected interrupt isn't generated.

If, instead, 4 NOPs(15) are sent, the DMA ring is again found to be
stuck at RPTR=3 with IDLE status as OFF. It seems to have an
affinity towards the 3rd position from the start.

I also ran the CP ring test with a MEM_WRITE operation instead of the
default SET_CONFIG_REG op. The test succeeds, thus proving that
the CP ring can indeed DMA into the system RAM at the lowest perf
profile.

Does this mean that the GPU doesn't support running DMA ring at the
lowest perf profile (157Mhz/200MHz)? I do still believe that this
situation might be a result of faulty/missing programming on my part,
though I am not sure what exactly it is that is at fault or is missing.

The machine is a kvm-vfio-enabled VM; the current ArchLinux ISO fails
to initialize the passthru device (-22 from radeon_device_init).

Thanks,
Amol


Re: Minimal GPU setup

2022-02-08 Thread Amol
Thank you Alex.

On 07/02/2022, Deucher, Alexander  wrote:
> [AMD Official Use Only]
>
> Most of the register programming in evergreen_gpu_init is required.  That
> code handles things like harvesting (e.g., disabling bad hardware resources)
> and setting sane asic specific settings in some registers.  If you don't do
> it, work may get scheduled to bad or incorrectly configured hardware blocks
> which will lead to hangs or corrupted results.  You can probably skip some
> of them, but I don't remember what is minimally required off hand.  It's
> generally a good idea to re-initialize those registers anyway in case
> someone has previously messed with them (e.g., manual register munging or
> GPU passed through to a VM etc.).

Understood.

>
> Posting the bios is enough to get you a working memory controller and enough
> asic setup to light up displays (basically what you need for pre-OS
> console).  As Christian mentioned, loading the ucodes will get the
> associated engines working so that you can start feeding commands to the
> GPU, but without proper configuration of the various hardware blocks on the
> GPU, you may not have success in feeding data to the GPU.

Understood. I think I wanted a confirmation that the steps I took so far are not
completely incorrect and may be just enough to see some GPU activity,
before I spend more effort programming other blocks. The feedback and a small
but working test helps restore the motivation.

Thanks,
Amol

>
> Alex
>
>
> 
> From: amd-gfx  on behalf of Amol
> 
> Sent: Saturday, February 5, 2022 4:47 AM
> To: amd-gfx@lists.freedesktop.org 
> Subject: Minimal GPU setup
>
> Hello,
>
> I am learning to program Radeon HD 7350 by reading the radeon
> driver source in Linux, and the guides/manuals from AMD.
>
> I understand the general flow of initialization the driver performs. I
> have also been able to understand and re-implement the ATOM
> BIOS virtual machine.
>
> I am trying to program the device up from scratch (i.e. bare-metal).
> Do I need to perform all those steps that the driver does? Reading
> the evergreen_gpu_init function is demotivating; it initializes many
> fields and registers which I suspect may not be required for a minimal
> setup.
>
> Is posting the BIOS and loading the microcode enough to get me started
> with running basic tasks (DMA transfers, simple packet processing, etc.)?
>
> Thanks,
> Amol
>


Re: Minimal GPU setup

2022-02-08 Thread Amol
Thank you Christian.

On 06/02/2022, Christian König  wrote:
> Hi Amol,
>
> Am 05.02.22 um 10:47 schrieb Amol:

. . .

>> Is posting the BIOS and loading the microcode enough to get me started
>> with running basic tasks (DMA transfers, simple packet processing, etc.)?
>
> Well yes and no. As bare minimum you need the following:
> 1. Firmware loading
> 2. Memory management
> 3. Ring buffer setup
> 4. Hardware initialization
>
> When that is done you can write commands into the ring buffers of the CP
> or SDMA and see if they are executed (see the *_ring_test() functions in
> the driver). SDMA is usually easier to get working.

The DMA-ring-test of making the SDMA write into a WB location in the
system RAM succeeded.

The sequence followed mimics what the Linux driver does for the most part,
until evergreen_gpu_init. That and the portions of power mgmt, interrupt mgmt,
indirect buffer mgmt, the entire _modeset_init were skipped for now.

The WB and the CP, DMA ring buffers are PAGE_SIZE buffers in the system
RAM. GTT is a 512-entries table, in the BAR0 aperture, appropriately filled in
to map the WB, CP and DMA buffers.

>
> When you got that working you can worry about IB (indirect buffers)
> which are basically subroutines calls written into the ring buffers.
>
> Most commands (like copy from A to B, fill something, write value X to
> memory or write X into register Y) can be used from the ring buffers
> directly, but IIRC some context switching commands which are part of the
> rendering process require special handling.
>
> But keep in mind that all of this will just be horrible slow because the
> ASIC runs with the bootup clocks which are something like 100Mhz or even
> only 17Mhz on very old models. To change that you need to implement
> power management, interrupt handling etc etc

Understood. Yes, the DPM and the IH portions. I think by programming only
for the hardware I have I can manage to set them up with comparatively less
effort.

Thanks,
Amol

>
> Good luck,
> Christian.
>
>>
>> Thanks,
>> Amol
>
>


Minimal GPU setup

2022-02-05 Thread Amol
Hello,

I am learning to program Radeon HD 7350 by reading the radeon
driver source in Linux, and the guides/manuals from AMD.

I understand the general flow of initialization the driver performs. I
have also been able to understand and re-implement the ATOM
BIOS virtual machine.

I am trying to program the device up from scratch (i.e. bare-metal).
Do I need to perform all those steps that the driver does? Reading
the evergreen_gpu_init function is demotivating; it initializes many
fields and registers which I suspect may not be required for a minimal
setup.

Is posting the BIOS and loading the microcode enough to get me started
with running basic tasks (DMA transfers, simple packet processing, etc.)?

Thanks,
Amol


Re: [radeon] connector_info_from_object_table

2021-11-19 Thread Amol
On 19/11/2021, Alex Deucher  wrote:
> On Thu, Nov 18, 2021 at 11:37 AM Amol  wrote:
>>
>> Hello,
>>
>> The function radeon_get_atom_connector_info_from_object_table,
>> at location [1], ends up parsing ATOM_COMMON_TABLE_HEADER
>> as ATOM_COMMON_RECORD_HEADER if
>> enc_obj->asObjects[k].usRecordOffset is zero. It is found to be zero
>> in the BIOS found at [2].
>>
>> Thankfully, the loop that follows exits immediately since ucRecordSize
>> is 0 because
>> (ATOM_COMMON_TABLE_HEADER.usStructureSize & 0xff00) is zero.
>> But, with suitable values in the usStructureSize, the loop can be made to
>> run and parse garbage.
>>
>> A similar loop exists when parsing the conn objects.
>
> Can you send a patch to make it more robust?

Sent on a separate email.

Thanks,
Amol

>
> Thanks,
>
> Alex
>
>>
>> -Amol
>>
>> [1]
>> https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/radeon/radeon_atombios.c#L652
>> [2] https://www.techpowerup.com/vgabios/211981/211981
>


[PATCH] drm/radeon: more sanity checks (usRecordOffset) to obj info record parsing

2021-11-19 Thread Amol Surati
When parsing Encoder, Connector, or Router records, if the
usRecordOffset field is 0, the driver ends up dereferencing
ATOM_COMMON_TABLE_HEADER of the Object Table as
ATOM_COMMON_RECORD_HEADER.

A BIOS, which triggers such dereferences when parsing the
Encoder records, is found on Cedar Radeon HD 7350/8350 GPU.

Allow record dereferences only if usRecordOffset is non-zero.

Signed-off-by: Amol Surati 
---
 drivers/gpu/drm/radeon/radeon_atombios.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_atombios.c 
b/drivers/gpu/drm/radeon/radeon_atombios.c
index 28c4413f4..bab0e1cc2 100644
--- a/drivers/gpu/drm/radeon/radeon_atombios.c
+++ b/drivers/gpu/drm/radeon/radeon_atombios.c
@@ -646,14 +646,15 @@ bool 
radeon_get_atom_connector_info_from_object_table(struct drm_device *dev)
if (grph_obj_type == GRAPH_OBJECT_TYPE_ENCODER) 
{
for (k = 0; k < 
enc_obj->ucNumberOfObjects; k++) {
u16 encoder_obj = 
le16_to_cpu(enc_obj->asObjects[k].usObjectID);
+   u16 rec_offset = 
le16_to_cpu(enc_obj->asObjects[k].usRecordOffset);
if 
(le16_to_cpu(path->usGraphicObjIds[j]) == encoder_obj) {

ATOM_COMMON_RECORD_HEADER *record = (ATOM_COMMON_RECORD_HEADER *)
-   (ctx->bios + 
data_offset +
-
le16_to_cpu(enc_obj->asObjects[k].usRecordOffset));
+   (ctx->bios + 
data_offset + rec_offset);
ATOM_ENCODER_CAP_RECORD 
*cap_record;
u16 caps = 0;
 
-   while 
(record->ucRecordSize > 0 &&
+   while (rec_offset > 0 &&
+  
record->ucRecordSize > 0 &&
   
record->ucRecordType > 0 &&
   
record->ucRecordType <= ATOM_MAX_OBJECT_RECORD_NUMBER) {
switch 
(record->ucRecordType) {
@@ -677,10 +678,10 @@ bool 
radeon_get_atom_connector_info_from_object_table(struct drm_device *dev)
} else if (grph_obj_type == 
GRAPH_OBJECT_TYPE_ROUTER) {
for (k = 0; k < 
router_obj->ucNumberOfObjects; k++) {
u16 router_obj_id = 
le16_to_cpu(router_obj->asObjects[k].usObjectID);
+   u16 rec_offset = 
le16_to_cpu(router_obj->asObjects[k].usRecordOffset);
if 
(le16_to_cpu(path->usGraphicObjIds[j]) == router_obj_id) {

ATOM_COMMON_RECORD_HEADER *record = (ATOM_COMMON_RECORD_HEADER *)
-   (ctx->bios + 
data_offset +
-
le16_to_cpu(router_obj->asObjects[k].usRecordOffset));
+   (ctx->bios + 
data_offset + rec_offset);
ATOM_I2C_RECORD 
*i2c_record;

ATOM_I2C_ID_CONFIG_ACCESS *i2c_config;

ATOM_ROUTER_DDC_PATH_SELECT_RECORD *ddc_path;
@@ -702,7 +703,8 @@ bool 
radeon_get_atom_connector_info_from_object_table(struct drm_device *dev)
break;
}
 
-   while 
(record->ucRecordSize > 0 &&
+   while (rec_offset > 0 &&
+  
record->ucRecordSize > 0 &&
   
record->ucRecordType > 0 &&
   
record->ucRecordType <= ATOM_MAX_OBJECT_RECORD_NUMBER) {
switch 
(record->ucRecordType) {
@@ -753,19 +755,18 @@ bool 
radeon_get_atom_connector_info_from_object_table(struct dr

[radeon] connector_info_from_object_table

2021-11-18 Thread Amol
Hello,

The function radeon_get_atom_connector_info_from_object_table,
at location [1], ends up parsing ATOM_COMMON_TABLE_HEADER
as ATOM_COMMON_RECORD_HEADER if
enc_obj->asObjects[k].usRecordOffset is zero. It is found to be zero
in the BIOS found at [2].

Thankfully, the loop that follows exits immediately since ucRecordSize
is 0 because
(ATOM_COMMON_TABLE_HEADER.usStructureSize & 0xff00) is zero.
But, with suitable values in the usStructureSize, the loop can be made to
run and parse garbage.

A similar loop exists when parsing the conn objects.

-Amol

[1] 
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/radeon/radeon_atombios.c#L652
[2] https://www.techpowerup.com/vgabios/211981/211981