On 16/01/2019 18:04, Lionel Landwerlin wrote:
On 16/01/2019 16:31, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-01-16 16:25:26)
On 16/01/2019 16:05, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-01-16 15:58:00)
On 16/01/2019 15:52, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-01-16 15:36:20)
@@ -1877,6 +1883,21 @@ struct drm_i915_private {
wait_queue_head_t poll_wq;
bool pollin;
+ /**
+ * Atomic counter incremented by the
interrupt
+ * handling code for each OA half full
interrupt
+ * received.
+ */
+ atomic64_t half_full_count;
+
+ /**
+ * Copy of the atomic half_full_count
that was last
+ * processed in the i915-perf driver. If
both counters
+ * differ, there is data available to
read in the OA
+ * buffer.
+ */
+ u64 half_full_count_last;
Eh? But why a relatively expensive atomic64. You only need one
bit, and
reading the tail pointer from WB memory should just be cheap. You
should
be able to sample the perf ringbuffer pointers very cheaply...
What am I
missing?
-Chris
Fair comment.
The thing is with this series there are 2 mechanism that notify
the poll_wq.
One is the hrtimer that kicks in at regular interval and polls the
register with the workaround.
The other is the interrupt which doesn't read the registers and
workaround.
What's the complication with the workaround?
It's a bit more than just looking at registers, we actually have to
look
at the content of the buffer to figure out what landed in memory.
The register values are not synchronized with the memory writes...
I don't want to look at registers at all for polling, and you shouldn't
need to since communication is via a ringbuf.
There is a comment in the code (i915_perf_poll_locked) about not
checking the register after each wakeup because that may be a very
hot path.
The atomic64 sounded like a lesser evil.
I'm clearly not understanding something here...
Does the hardware not do:
update ringbuf data;
wmb() (or post to global observation point in their parlance)
update ringbuf tail
As far as I understand, the OA unit :
sends its memory write requests to the memory controller
immediately updates the ringbuf tail, without waiting for the
previous requests to complete
By experimentation, I've haven't seen a delta between what is available
in memory and what the OA tail register indicate larger than 768 bytes,
which is about 3 OA reports at the largest size.
There is probably a maximum number of write requests the OA unit can
queue before blocking.
So again maybe you would prefer a 2 stage mechanism :
OA interrupt -----> head/tail pointer worker -----> wake up userspace
hrtimer head/tail pointer --|
Then we just need to sample the ringbuf tail and compare against how far
we read last time?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx