[Intel-gfx] [REGRESSION] drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()

2017-05-18 Thread Nicolai Stange
Hi,

my system (always) locks up when booting a next-20170515 kernel.

No oops. Sending magic sysrqs over serial doesn't cause any reaction.

Last few console messages before death are:

  [7.089221] Console: switching to colour frame buffer device 128x48
  [7.101470] radeon :01:00.0: fb0: radeondrmfb frame buffer device
  [7.111292] [drm] Initialized radeon 2.50.0 20080528 for :01:00.0 on 
minor 1
  [7.113056] [drm] Initialized i915 1.6.0 20170403 for :00:02.0 on 
minor 0
  [7.113446] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
  [7.114227] ACPI: Video Device [PEGP] (multi-head: yes  rom: no  post: no)
  [7.114798] input: Video Bus as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:4f/LNXVIDEO:00/input/input12
  [7.116253] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
  [7.130432] input: Video Bus as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:01/input/input13
  [7.130481] [drm] Initialized i915 1.6.0 20170403 for :00:02.0 on 
minor 0


Bisection lead to commit 25112b64b3d2 ("drm/i915: Wait for all engines
to be idle as part of i915_gem_wait_for_idle()"). Reverting this commit
on top of next-20170515 fixes the issue for me.

According to lspci, my i915 is a

  00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor 
Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])

or in raw form:

  00:02.0 0300: 8086:0416 (rev 06)


Thanks,

Nicolai
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/4] drm/dp: start a DPCD based DP sink/branch device quirk database

2017-05-18 Thread Jani Nikula
Face the fact, there are Display Port sink and branch devices out there
in the wild that don't follow the Display Port specifications, or they
have bugs, or just otherwise require special treatment. Start a common
quirk database the drivers can query based on the DP device
identification. At least for now, we leave the workarounds for the
drivers to implement as they see fit.

For starters, add a branch device that can't handle full 24-bit main
link Mdiv and Ndiv main link attributes properly. Naturally, the
workaround of reducing main link attributes for all devices ended up in
regressions for other devices. So here we are.

v2: Rebase on DRM DP desc read helpers

v3: Fix the OUI memcmp blunder (Clint)

Cc: Ville Syrjälä 
Cc: Dhinakaran Pandiyan 
Cc: Clint Taylor 
Cc: Adam Jackson 
Cc: Harry Wentland 
Signed-off-by: Jani Nikula 
---
 drivers/gpu/drm/drm_dp_helper.c | 52 +++--
 include/drm/drm_dp_helper.h | 32 +
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index 52e0ca9a5bb1..213fb837e1c4 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -1209,6 +1209,51 @@ int drm_dp_stop_crc(struct drm_dp_aux *aux)
 }
 EXPORT_SYMBOL(drm_dp_stop_crc);
 
+struct dpcd_quirk {
+   u8 oui[3];
+   bool is_branch;
+   u32 quirks;
+};
+
+#define OUI(first, second, third) { (first), (second), (third) }
+
+static const struct dpcd_quirk dpcd_quirk_list[] = {
+   /* Analogix 7737 needs reduced M and N at HBR2 link rates */
+   { OUI(0x00, 0x22, 0xb9), true, BIT(DP_DPCD_QUIRK_LIMITED_M_N) },
+};
+
+#undef OUI
+
+/*
+ * Get a bit mask of DPCD quirks for the sink/branch device identified by
+ * ident. The quirk data is shared but it's up to the drivers to act on the
+ * data.
+ *
+ * For now, only the OUI (first three bytes) is used, but this may be extended
+ * to device identification string and hardware/firmware revisions later.
+ */
+static u32
+drm_dp_get_quirks(const struct drm_dp_dpcd_ident *ident, bool is_branch)
+{
+   const struct dpcd_quirk *quirk;
+   u32 quirks = 0;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(dpcd_quirk_list); i++) {
+   quirk = &dpcd_quirk_list[i];
+
+   if (quirk->is_branch != is_branch)
+   continue;
+
+   if (memcmp(quirk->oui, ident->oui, sizeof(ident->oui)) != 0)
+   continue;
+
+   quirks |= quirk->quirks;
+   }
+
+   return quirks;
+}
+
 /**
  * drm_dp_read_desc - read sink/branch descriptor from DPCD
  * @aux: DisplayPort AUX channel
@@ -1231,14 +1276,17 @@ int drm_dp_read_desc(struct drm_dp_aux *aux, struct 
drm_dp_desc *desc,
if (ret < 0)
return ret;
 
+   desc->quirks = drm_dp_get_quirks(ident, is_branch);
+
dev_id_len = strnlen(ident->device_id, sizeof(ident->device_id));
 
-   DRM_DEBUG_KMS("DP %s: OUI %*phD dev-ID %*pE HW-rev %d.%d SW-rev 
%d.%d\n",
+   DRM_DEBUG_KMS("DP %s: OUI %*phD dev-ID %*pE HW-rev %d.%d SW-rev %d.%d 
quirks 0x%04x\n",
  is_branch ? "branch" : "sink",
  (int)sizeof(ident->oui), ident->oui,
  dev_id_len, ident->device_id,
  ident->hw_rev >> 4, ident->hw_rev & 0xf,
- ident->sw_major_rev, ident->sw_minor_rev);
+ ident->sw_major_rev, ident->sw_minor_rev,
+ desc->quirks);
 
return 0;
 }
diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index aee5b96b51d7..717cb8496725 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -1090,12 +1090,44 @@ struct drm_dp_dpcd_ident {
 /**
  * struct drm_dp_desc - DP branch/sink device descriptor
  * @ident: DP device identification from DPCD 0x400 (sink) or 0x500 (branch).
+ * @quirks: Quirks; use drm_dp_has_quirk() to query for the quirks.
  */
 struct drm_dp_desc {
struct drm_dp_dpcd_ident ident;
+   u32 quirks;
 };
 
 int drm_dp_read_desc(struct drm_dp_aux *aux, struct drm_dp_desc *desc,
 bool is_branch);
 
+/**
+ * enum drm_dp_quirk - Display Port sink/branch device specific quirks
+ *
+ * Display Port sink and branch devices in the wild have a variety of bugs, try
+ * to collect them here. The quirks are shared, but it's up to the drivers to
+ * implement workarounds for them.
+ */
+enum drm_dp_quirk {
+   /**
+* @DP_DPCD_QUIRK_LIMITED_M_N:
+*
+* The device requires main link attributes Mdiv and Ndiv to be limited
+* to 16 bits.
+*/
+   DP_DPCD_QUIRK_LIMITED_M_N,
+};
+
+/**
+ * drm_dp_has_quirk() - does the DP device have a specific quirk
+ * @desc: Device decriptor filled by drm_dp_read_desc()
+ * @quirk: Quirk to query for
+ *
+ * Return true if DP device identified by @desc has @quirk.
+ */
+static inline boo

[Intel-gfx] [PATCH 4/4] drm/i915: Detect USB-C specific dongles before reducing M and N

2017-05-18 Thread Jani Nikula
The Analogix 7737 DP to HDMI converter requires reduced M and N values
when to operate correctly at HBR2. We tried to reduce the M/N values for
all devices in commit 9a86cda07af2 ("drm/i915/dp: reduce link M/N
parameters"), but that regressed some other sinks. Detect this IC by its
OUI value of 0x0022B9 via the DPCD quirk list, and only reduce the M/N
values for that.

v2 by Jani: Rebased on the DP quirk database

v3 by Jani: Rebased on the reworked DP quirk database

v4 by Jani: Improve commit message (Daniel)

Fixes: 9a86cda07af2 ("drm/i915/dp: reduce link M/N parameters")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93578
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100755
Cc: Jani Nikula 
Cc: Dhinakaran Pandiyan 
Cc: Ville Syrjälä 
Reviewed-by: Daniel Vetter 
Signed-off-by: Clint Taylor 
Signed-off-by: Jani Nikula 
---
 drivers/gpu/drm/i915/i915_drv.h  |  3 ++-
 drivers/gpu/drm/i915/intel_display.c | 22 ++
 drivers/gpu/drm/i915/intel_dp.c  |  8 ++--
 drivers/gpu/drm/i915/intel_dp_mst.c  |  5 -
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 17883a84b8c1..31ca580e47ae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -611,7 +611,8 @@ struct intel_link_m_n {
 
 void intel_link_compute_m_n(int bpp, int nlanes,
int pixel_clock, int link_clock,
-   struct intel_link_m_n *m_n);
+   struct intel_link_m_n *m_n,
+   bool reduce_m_n);
 
 /* Interface history:
  *
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 8217ed0e7132..ea4f116bd410 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -6116,7 +6116,7 @@ static int ironlake_fdi_compute_config(struct intel_crtc 
*intel_crtc,
pipe_config->fdi_lanes = lane;
 
intel_link_compute_m_n(pipe_config->pipe_bpp, lane, fdi_dotclock,
-  link_bw, &pipe_config->fdi_m_n);
+  link_bw, &pipe_config->fdi_m_n, false);
 
ret = ironlake_check_fdi_lanes(dev, intel_crtc->pipe, pipe_config);
if (ret == -EINVAL && pipe_config->pipe_bpp > 6*3) {
@@ -6292,7 +6292,8 @@ intel_reduce_m_n_ratio(uint32_t *num, uint32_t *den)
 }
 
 static void compute_m_n(unsigned int m, unsigned int n,
-   uint32_t *ret_m, uint32_t *ret_n)
+   uint32_t *ret_m, uint32_t *ret_n,
+   bool reduce_m_n)
 {
/*
 * Reduce M/N as much as possible without loss in precision. Several DP
@@ -6300,9 +6301,11 @@ static void compute_m_n(unsigned int m, unsigned int n,
 * values. The passed in values are more likely to have the least
 * significant bits zero than M after rounding below, so do this first.
 */
-   while ((m & 1) == 0 && (n & 1) == 0) {
-   m >>= 1;
-   n >>= 1;
+   if (reduce_m_n) {
+   while ((m & 1) == 0 && (n & 1) == 0) {
+   m >>= 1;
+   n >>= 1;
+   }
}
 
*ret_n = min_t(unsigned int, roundup_pow_of_two(n), DATA_LINK_N_MAX);
@@ -6313,16 +6316,19 @@ static void compute_m_n(unsigned int m, unsigned int n,
 void
 intel_link_compute_m_n(int bits_per_pixel, int nlanes,
   int pixel_clock, int link_clock,
-  struct intel_link_m_n *m_n)
+  struct intel_link_m_n *m_n,
+  bool reduce_m_n)
 {
m_n->tu = 64;
 
compute_m_n(bits_per_pixel * pixel_clock,
link_clock * nlanes * 8,
-   &m_n->gmch_m, &m_n->gmch_n);
+   &m_n->gmch_m, &m_n->gmch_n,
+   reduce_m_n);
 
compute_m_n(pixel_clock, link_clock,
-   &m_n->link_m, &m_n->link_n);
+   &m_n->link_m, &m_n->link_n,
+   reduce_m_n);
 }
 
 static inline bool intel_panel_use_ssc(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 2a5f385e8f44..1ae9de5cf39c 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1627,6 +1627,8 @@ intel_dp_compute_config(struct intel_encoder *encoder,
int link_avail, link_clock;
int common_len;
uint8_t link_bw, rate_select;
+   bool reduce_m_n = drm_dp_has_quirk(&intel_dp->desc,
+  DP_DPCD_QUIRK_LIMITED_M_N);
 
common_len = intel_dp_common_len_rate_limit(intel_dp,
intel_dp->max_link_rate);
@@ -1759,7 +1761,8 @@ intel_dp_compute_config(struct intel_encoder *encoder,
intel_link_compute_m_n(bpp, lane_count,
   adjust

[Intel-gfx] [PATCH 1/4] drm/dp: add helper for reading DP sink/branch device desc from DPCD

2017-05-18 Thread Jani Nikula
Reviewed-by: Daniel Vetter 
Signed-off-by: Jani Nikula 
---
 drivers/gpu/drm/drm_dp_helper.c | 35 +++
 include/drm/drm_dp_helper.h | 19 +++
 2 files changed, 54 insertions(+)

diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index 3e5f52110ea1..52e0ca9a5bb1 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -1208,3 +1208,38 @@ int drm_dp_stop_crc(struct drm_dp_aux *aux)
return 0;
 }
 EXPORT_SYMBOL(drm_dp_stop_crc);
+
+/**
+ * drm_dp_read_desc - read sink/branch descriptor from DPCD
+ * @aux: DisplayPort AUX channel
+ * @desc: Device decriptor to fill from DPCD
+ * @is_branch: true for branch devices, false for sink devices
+ *
+ * Read DPCD 0x400 (sink) or 0x500 (branch) into @desc. Also debug log the
+ * identification.
+ *
+ * Returns 0 on success or a negative error code on failure.
+ */
+int drm_dp_read_desc(struct drm_dp_aux *aux, struct drm_dp_desc *desc,
+bool is_branch)
+{
+   struct drm_dp_dpcd_ident *ident = &desc->ident;
+   unsigned int offset = is_branch ? DP_BRANCH_OUI : DP_SINK_OUI;
+   int ret, dev_id_len;
+
+   ret = drm_dp_dpcd_read(aux, offset, ident, sizeof(*ident));
+   if (ret < 0)
+   return ret;
+
+   dev_id_len = strnlen(ident->device_id, sizeof(ident->device_id));
+
+   DRM_DEBUG_KMS("DP %s: OUI %*phD dev-ID %*pE HW-rev %d.%d SW-rev 
%d.%d\n",
+ is_branch ? "branch" : "sink",
+ (int)sizeof(ident->oui), ident->oui,
+ dev_id_len, ident->device_id,
+ ident->hw_rev >> 4, ident->hw_rev & 0xf,
+ ident->sw_major_rev, ident->sw_minor_rev);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_dp_read_desc);
diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index f7007e544f29..aee5b96b51d7 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -1079,4 +1079,23 @@ void drm_dp_aux_unregister(struct drm_dp_aux *aux);
 int drm_dp_start_crc(struct drm_dp_aux *aux, struct drm_crtc *crtc);
 int drm_dp_stop_crc(struct drm_dp_aux *aux);
 
+struct drm_dp_dpcd_ident {
+   u8 oui[3];
+   u8 device_id[6];
+   u8 hw_rev;
+   u8 sw_major_rev;
+   u8 sw_minor_rev;
+} __packed;
+
+/**
+ * struct drm_dp_desc - DP branch/sink device descriptor
+ * @ident: DP device identification from DPCD 0x400 (sink) or 0x500 (branch).
+ */
+struct drm_dp_desc {
+   struct drm_dp_dpcd_ident ident;
+};
+
+int drm_dp_read_desc(struct drm_dp_aux *aux, struct drm_dp_desc *desc,
+bool is_branch);
+
 #endif /* _DRM_DP_HELPER_H_ */
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/4] drm/i915: use drm DP helper to read DPCD desc

2017-05-18 Thread Jani Nikula
Switch to using the common DP helpers instead of using our own.

v2: also remove leftover struct intel_dp_desc (Daniel)

Reviewed-by: Daniel Vetter 
Signed-off-by: Jani Nikula 
---
 drivers/gpu/drm/i915/intel_dp.c | 37 -
 drivers/gpu/drm/i915/intel_drv.h| 13 +
 drivers/gpu/drm/i915/intel_lspcon.c |  2 +-
 3 files changed, 6 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 4a6feb6a69bd..2a5f385e8f44 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1548,37 +1548,6 @@ static void intel_dp_print_rates(struct intel_dp 
*intel_dp)
DRM_DEBUG_KMS("common rates: %s\n", str);
 }
 
-bool
-__intel_dp_read_desc(struct intel_dp *intel_dp, struct intel_dp_desc *desc)
-{
-   u32 base = drm_dp_is_branch(intel_dp->dpcd) ? DP_BRANCH_OUI :
- DP_SINK_OUI;
-
-   return drm_dp_dpcd_read(&intel_dp->aux, base, desc, sizeof(*desc)) ==
-  sizeof(*desc);
-}
-
-bool intel_dp_read_desc(struct intel_dp *intel_dp)
-{
-   struct intel_dp_desc *desc = &intel_dp->desc;
-   bool oui_sup = intel_dp->dpcd[DP_DOWN_STREAM_PORT_COUNT] &
-  DP_OUI_SUPPORT;
-   int dev_id_len;
-
-   if (!__intel_dp_read_desc(intel_dp, desc))
-   return false;
-
-   dev_id_len = strnlen(desc->device_id, sizeof(desc->device_id));
-   DRM_DEBUG_KMS("DP %s: OUI %*phD%s dev-ID %*pE HW-rev %d.%d SW-rev 
%d.%d\n",
- drm_dp_is_branch(intel_dp->dpcd) ? "branch" : "sink",
- (int)sizeof(desc->oui), desc->oui, oui_sup ? "" : "(NS)",
- dev_id_len, desc->device_id,
- desc->hw_rev >> 4, desc->hw_rev & 0xf,
- desc->sw_major_rev, desc->sw_minor_rev);
-
-   return true;
-}
-
 int
 intel_dp_max_link_rate(struct intel_dp *intel_dp)
 {
@@ -3662,7 +3631,8 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
if (!intel_dp_read_dpcd(intel_dp))
return false;
 
-   intel_dp_read_desc(intel_dp);
+   drm_dp_read_desc(&intel_dp->aux, &intel_dp->desc,
+drm_dp_is_branch(intel_dp->dpcd));
 
if (intel_dp->dpcd[DP_DPCD_REV] >= 0x11)
dev_priv->no_aux_handshake = intel_dp->dpcd[DP_MAX_DOWNSPREAD] &
@@ -4677,7 +4647,8 @@ intel_dp_long_pulse(struct intel_connector 
*intel_connector)
 
intel_dp_print_rates(intel_dp);
 
-   intel_dp_read_desc(intel_dp);
+   drm_dp_read_desc(&intel_dp->aux, &intel_dp->desc,
+drm_dp_is_branch(intel_dp->dpcd));
 
intel_dp_configure_mst(intel_dp);
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index bd500977b3fc..71f94a01aedd 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -938,14 +938,6 @@ enum link_m_n_set {
M2_N2
 };
 
-struct intel_dp_desc {
-   u8 oui[3];
-   u8 device_id[6];
-   u8 hw_rev;
-   u8 sw_major_rev;
-   u8 sw_minor_rev;
-} __packed;
-
 struct intel_dp_compliance_data {
unsigned long edid;
uint8_t video_pattern;
@@ -996,7 +988,7 @@ struct intel_dp {
/* Max rate for the current link */
int max_link_rate;
/* sink or branch descriptor */
-   struct intel_dp_desc desc;
+   struct drm_dp_desc desc;
struct drm_dp_aux aux;
enum intel_display_power_domain aux_power_domain;
uint8_t train_set[4];
@@ -1571,9 +1563,6 @@ static inline unsigned int intel_dp_unused_lane_mask(int 
lane_count)
 }
 
 bool intel_dp_read_dpcd(struct intel_dp *intel_dp);
-bool __intel_dp_read_desc(struct intel_dp *intel_dp,
- struct intel_dp_desc *desc);
-bool intel_dp_read_desc(struct intel_dp *intel_dp);
 int intel_dp_link_required(int pixel_clock, int bpp);
 int intel_dp_max_data_rate(int max_link_clock, int max_lanes);
 bool intel_digital_port_connected(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_lspcon.c 
b/drivers/gpu/drm/i915/intel_lspcon.c
index 71cbe9c08932..5abef482eacf 100644
--- a/drivers/gpu/drm/i915/intel_lspcon.c
+++ b/drivers/gpu/drm/i915/intel_lspcon.c
@@ -240,7 +240,7 @@ bool lspcon_init(struct intel_digital_port *intel_dig_port)
return false;
}
 
-   intel_dp_read_desc(dp);
+   drm_dp_read_desc(&dp->aux, &dp->desc, drm_dp_is_branch(dp->dpcd));
 
DRM_DEBUG_KMS("Success: LSPCON init\n");
return true;
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/4] drm/dp: device identification and quirks ~v3

2017-05-18 Thread Jani Nikula
Update on [1].

BR,
Jani.


[1] cover.1495030890.git.jani.nikula@intel.com">http://mid.mail-archive.com/cover.1495030890.git.jani.nikula@intel.com


Jani Nikula (4):
  drm/dp: add helper for reading DP sink/branch device desc from DPCD
  drm/i915: use drm DP helper to read DPCD desc
  drm/dp: start a DPCD based DP sink/branch device quirk database
  drm/i915: Detect USB-C specific dongles before reducing M and N

 drivers/gpu/drm/drm_dp_helper.c  | 83 
 drivers/gpu/drm/i915/i915_drv.h  |  3 +-
 drivers/gpu/drm/i915/intel_display.c | 22 ++
 drivers/gpu/drm/i915/intel_dp.c  | 45 +--
 drivers/gpu/drm/i915/intel_dp_mst.c  |  5 ++-
 drivers/gpu/drm/i915/intel_drv.h | 13 +-
 drivers/gpu/drm/i915/intel_lspcon.c  |  2 +-
 include/drm/drm_dp_helper.h  | 51 ++
 8 files changed, 166 insertions(+), 58 deletions(-)

-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Fix new -Wint-in-bool-context gcc compiler warning

2017-05-18 Thread Hans de Goede
This commit fixes the following compiler warning:

drivers/gpu/drm/i915/intel_dsi.c: In function ‘intel_dsi_prepare’:
drivers/gpu/drm/i915/intel_dsi.c:1487:23: warning:
?: using integer constants in boolean context [-Wint-in-bool-context]
   PORT_A ? PORT_C : PORT_A),

Signed-off-by: Hans de Goede 
---
 drivers/gpu/drm/i915/i915_reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 11b12f412492..db5c49531033 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -8276,7 +8276,7 @@ enum {
 
 /* MIPI DSI registers */
 
-#define _MIPI_PORT(port, a, c) ((port) ? c : a)/* ports A and C only */
+#define _MIPI_PORT(port, a, c) (((port) == PORT_A) ? a : c)/* ports A and 
C only */
 #define _MMIO_MIPI(port, a, c) _MMIO(_MIPI_PORT(port, a, c))
 
 #define MIPIO_TXESC_CLK_DIV1   _MMIO(0x160004)
-- 
2.12.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Joonas Lahtinen
On ke, 2017-05-17 at 16:40 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Building on top of the previous patch which exported the concept
> of engine classes and instances, we can also use this instead of
> the current awkward engine selection uAPI.
> 
> This is primarily interesting for the VCS engine selection which
> is a) currently done via disjoint set of flags, and b) the
> current I915_EXEC_BSD flags has different semantics depending on
> the underlying hardware which is bad.
> 
> Proposed idea here is to reserve 16-bits of flags, to pass in
> the engine class and instance (8 bits each), and a new flag
> named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
> selection API is in use.

Would it make sense to use bitmask for future proofing? Then we could
allow passing multiple options in the future.

Other than that, looks good. Chris already commented some on the code
itself.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/4] drm/i915/gvt: reorder the shadow ppgtt update process by adding entry first

2017-05-18 Thread Joonas Lahtinen
On to, 2017-05-18 at 02:23 +, Zhang, Tina wrote:
> 
> > 
> > -Original Message-
> > From: intel-gvt-dev [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On
> > Behalf Of Joonas Lahtinen
> > Sent: Monday, May 15, 2017 6:50 PM
> > > > To: Zhang, Tina ; 
> > > > intel-gvt-...@lists.freedesktop.org
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH v2 1/4] drm/i915/gvt: reorder the shadow 
> > ppgtt
> > update process by adding entry first
> > 
> > On pe, 2017-05-12 at 17:37 +0800, Tina Zhang wrote:
> > > 
> > > Guest i915 driver may update the ppgtt table just before this workload
> > > is going to be submitted to the hardware by device model. This case
> > > wasn't handled well by device model before, due to the small time
> > > window between removing old ppgtt entry and adding the new one. Errors
> > > occur when the workload is executed by hardware during that small time
> > > window. This patch is to remove this time window by adding the new
> > > ppgtt entry first and then remove the old one.
> > > 
> > > > > > Signed-off-by: Tina Zhang 
> > 
> > This should be reviewed on the GVT mailing list, and this should include the
> > squashed hunk which exports the newly added capability.
> Thanks for the comments. I'm sorry I didn’t get it, here. Do you want
> me to remove this patch from this patch set?

It might be easier to merge if this is sent as two separate patch
series, because other gets merged through gvt tree and the other
directly to tip.

> Well, the reason this patch is here is because we met a GPU hang
> issue when guest i915 using full ppgtt. This is a blocking issue for
> enabling guest i915 full ppgtt functionality. This patch is to fix
> that issue. So, with this patch, the device model can have the
> capability to support guest i915 full ppgtt functionality. And the
> other patches in this patch set are used for guest to communicate
> with host whether the full ppgtt capability is supported.

Yes, but to help bisecting, the capability should be exported in the
same patch that implements the capability.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915/gvt: disable GVT-g if host GuC submission is enabled

2017-05-18 Thread Joonas Lahtinen
On ti, 2017-05-16 at 07:53 +, Dong, Chuanxiao wrote:
> > 
> > Thanks Joonas. If to fail with -EIO, how about for the other two checks: " 
> > if
> > (!is_supported_device(dev_priv))" and " if (!i915.enable_execlists)"?
> > Currently these two cases are failed with 0 instead of -EIO. Looks like 
> > should
> > also fail with -EIO?

!is_supported_device check should probably be done in parameter
sanitization phase, it should not result in EIO, because enable_gvt is
not marked as an unsafe option.

enable_execlists on the other hand is a an _unsafe option and should
result in -EIO.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] drm/dp: start a DPCD based DP sink/branch device quirk database

2017-05-18 Thread Jani Nikula
On Wed, 17 May 2017, Clint Taylor  wrote:
> On 05/17/2017 07:25 AM, Jani Nikula wrote:
>> Face the fact, there are Display Port sink and branch devices out there
>> in the wild that don't follow the Display Port specifications, or they
>> have bugs, or just otherwise require special treatment. Start a common
>> quirk database the drivers can query based on the DP device
>> identification. At least for now, we leave the workarounds for the
>> drivers to implement as they see fit.
>>
>> For starters, add a branch device that can't handle full 24-bit main
>> link M and N attributes properly. Naturally, the workaround of reducing
>> main link attributes for all devices ended up in regressions for other
>> devices. So here we are.
>>
>> v2: Rebase on DRM DP desc read helpers
>>
>> Cc: Ville Syrjälä 
>> Cc: Dhinakaran Pandiyan 
>> Cc: Clint Taylor 
>> Cc: Adam Jackson 
>> Cc: Harry Wentland 
>> Signed-off-by: Jani Nikula 
>> ---
>>   drivers/gpu/drm/drm_dp_helper.c | 52 
>> +++--
>>   include/drm/drm_dp_helper.h | 32 +
>>   2 files changed, 82 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_dp_helper.c 
>> b/drivers/gpu/drm/drm_dp_helper.c
>> index 52e0ca9a5bb1..8c3797283c3b 100644
>> --- a/drivers/gpu/drm/drm_dp_helper.c
>> +++ b/drivers/gpu/drm/drm_dp_helper.c
>> @@ -1209,6 +1209,51 @@ int drm_dp_stop_crc(struct drm_dp_aux *aux)
>>   }
>>   EXPORT_SYMBOL(drm_dp_stop_crc);
>>   
>> +struct dpcd_quirk {
>> +u8 oui[3];
>> +bool is_branch;
>> +u32 quirks;
>> +};
>> +
>> +#define OUI(first, second, third) { (first), (second), (third) }
>> +
>> +static const struct dpcd_quirk dpcd_quirk_list[] = {
>> +/* Analogix 7737 needs reduced M and N at HBR2 link rates */
>> +{ OUI(0x00, 0x22, 0xb9), true, BIT(DP_DPCD_QUIRK_LIMITED_M_N) },
>> +};
>> +
>> +#undef OUI
>> +
>> +/*
>> + * Get a bit mask of DPCD quirks for the sink/branch device identified by
>> + * ident. The quirk data is shared but it's up to the drivers to act on the
>> + * data.
>> + *
>> + * For now, only the OUI (first three bytes) is used, but this may be 
>> extended
>> + * to device identification string and hardware/firmware revisions later.
>> + */
>> +static u32
>> +drm_dp_get_quirks(const struct drm_dp_dpcd_ident *ident, bool is_branch)
>> +{
>> +const struct dpcd_quirk *quirk;
>> +u32 quirks = 0;
>> +int i;
>> +
>> +for (i = 0; i < ARRAY_SIZE(dpcd_quirk_list); i++) {
>> +quirk = &dpcd_quirk_list[i];
>> +
>> +if (quirk->is_branch != is_branch)
>> +continue;
>> +
>> +if (memcmp(quirk->oui, ident->oui, sizeof(ident->oui) != 0))
>> +continue;
>
> Oops, All branch devices appear to have the quirk applied. The memcmp 
> should be closed before the !=0

Now that's embarrassing. Thanks for catching this.

BR,
Jani.


>
> if (memcmp(quirk->oui, ident->oui, sizeof(ident->oui)) != 0)
>
>
> [  659.496861] [drm:drm_dp_read_desc] DP branch: OUI 00-1c-f8 dev-ID 
> 175IB0 HW-rev 1.0 SW-rev 7.32 quirks 0x0001
> [  659.549017] [drm:drm_dp_read_desc] DP branch: OUI 00-22-b9 dev-ID 
> 7737 HW-rev 10.12 SW-rev 6.4 quirks 0x0001
> [  659.630449] [drm:drm_dp_read_desc] DP sink: OUI ee-ff-c0 dev-ID \001 
> HW-rev 0.0 SW-rev 0.0 quirks 0x
>
> This was actually good to find out that LSPCON didn't like the reduced M 
> and N from the quirk at HBR2.
>
> -Clint
>
>> +
>> +quirks |= quirk->quirks;
>> +}
>> +
>> +return quirks;
>> +}
>> +
>>   /**
>>* drm_dp_read_desc - read sink/branch descriptor from DPCD
>>* @aux: DisplayPort AUX channel
>> @@ -1231,14 +1276,17 @@ int drm_dp_read_desc(struct drm_dp_aux *aux, struct 
>> drm_dp_desc *desc,
>>  if (ret < 0)
>>  return ret;
>>   
>> +desc->quirks = drm_dp_get_quirks(ident, is_branch);
>> +
>>  dev_id_len = strnlen(ident->device_id, sizeof(ident->device_id));
>>   
>> -DRM_DEBUG_KMS("DP %s: OUI %*phD dev-ID %*pE HW-rev %d.%d SW-rev 
>> %d.%d\n",
>> +DRM_DEBUG_KMS("DP %s: OUI %*phD dev-ID %*pE HW-rev %d.%d SW-rev %d.%d 
>> quirks 0x%04x\n",
>>is_branch ? "branch" : "sink",
>>(int)sizeof(ident->oui), ident->oui,
>>dev_id_len, ident->device_id,
>>ident->hw_rev >> 4, ident->hw_rev & 0xf,
>> -  ident->sw_major_rev, ident->sw_minor_rev);
>> +  ident->sw_major_rev, ident->sw_minor_rev,
>> +  desc->quirks);
>>   
>>  return 0;
>>   }
>> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
>> index aee5b96b51d7..717cb8496725 100644
>> --- a/include/drm/drm_dp_helper.h
>> +++ b/include/drm/drm_dp_helper.h
>> @@ -1090,12 +1090,44 @@ struct drm_dp_dpcd_ident {
>>   /**
>>* struct drm_dp_desc - DP branch/sink device descriptor
>>* @ident: DP device identification from DPCD 0x400 (sink) or 0x500 
>> (branch).
>> + * @quirks: Quirks; us

[Intel-gfx] [PULL] drm-intel-fixes

2017-05-18 Thread Jani Nikula

Hi Dave, drm/i915 fixes for v4.12-rc2.

BR,
Jani.

The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6:

  Linux 4.12-rc1 (2017-05-13 13:19:49 -0700)

are available in the git repository at:

  git://anongit.freedesktop.org/git/drm-intel tags/drm-intel-fixes-2017-05-18-1

for you to fetch changes up to 2f720aac936dc7a301b757d3b197d86c333d59b8:

  drm/i915: don't do allocate_va_range again on PIN_UPDATE (2017-05-15 14:44:33 
+0300)


drm/i915 fixes for v4.12-rc2


Ander Conselvan de Oliveira (1):
  drm/i915/glk: Fix DSI "*ERROR* ULPS is still active" messages

Chuanxiao Dong (1):
  drm/i915/gvt: not to restore in-context mmio

Colin Ian King (1):
  drm/i915/gvt: fix typo: "supporte" -> "support"

Jani Nikula (1):
  Merge tag 'gvt-fixes-2017-05-11' of https://github.com/01org/gvt-linux 
into drm-intel-fixes

Matthew Auld (1):
  drm/i915: don't do allocate_va_range again on PIN_UPDATE

Ping Gao (1):
  drm/i915/gvt: avoid unnecessary vgpu switch

Ville Syrjälä (2):
  drm/i915: Fix runtime PM for LPE audio
  drm/i915: Fix rawclk readout for g4x

 drivers/gpu/drm/i915/gvt/handlers.c |  2 +-
 drivers/gpu/drm/i915/gvt/render.c   |  3 +++
 drivers/gpu/drm/i915/gvt/sched_policy.c |  8 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 12 
 drivers/gpu/drm/i915/i915_reg.h | 10 +++---
 drivers/gpu/drm/i915/intel_cdclk.c  |  6 ++
 drivers/gpu/drm/i915/intel_dsi.c|  7 +++
 drivers/gpu/drm/i915/intel_lpe_audio.c  |  5 +
 sound/x86/intel_hdmi_audio.c|  4 
 9 files changed, 35 insertions(+), 22 deletions(-)

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 4/5] drm/i915/gvt: Dmabuf support for GVT-g

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 05:50:04PM +0800, Xiaoguang Chen wrote:
> +#define GEN8_DECODE_PTE(pte) \
> + ((dma_addr_t)(u64)pte) >> 12) & 0x7ffULL) << 12))
> +
> +static struct sg_table *intel_vgpu_gem_get_pages(
> + struct drm_i915_gem_object *obj)
> +{
> + struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> + struct intel_gvt *gvt = dev_priv->gvt;
> + struct sg_table *st;
> + struct scatterlist *sg;
> + int i, ret;
> + gen8_pte_t __iomem *gtt_entries;
> + struct intel_vgpu *vgpu;
> + unsigned int fb_gma = 0, fb_size = 0;
> + bool found = false;
> +
> + mutex_lock(&gvt->lock);
> + for_each_active_vgpu(gvt, vgpu, i) {
> + if (vgpu->obj_dmabuf == obj) {
> + fb_gma = vgpu->plane_info->start;
> + fb_size = vgpu->plane_info->size;
> + found = true;
> + break;
> + }
> + }
> + mutex_unlock(&gvt->lock);
> +
> + if (!found) {
> + gvt_vgpu_err("no vgpu found\n");
> + return NULL;
> + }
> +
> + st = kmalloc(sizeof(*st), GFP_KERNEL);
> + if (!st) {
> + ret = -ENOMEM;
> + return ERR_PTR(ret);
> + }
> +
> + ret = sg_alloc_table(st, fb_size, GFP_KERNEL);
> + if (ret) {
> + kfree(st);
> + return ERR_PTR(ret);
> + }
> + gtt_entries = (gen8_pte_t __iomem *)dev_priv->ggtt.gsm +
> + (fb_gma >> PAGE_SHIFT);
> + for_each_sg(st->sgl, sg, fb_size, i) {
> + sg->offset = 0;
> + sg->length = PAGE_SIZE;
> + sg_dma_address(sg) =
> + GEN8_DECODE_PTE(readq(>t_entries[i]));
> + sg_dma_len(sg) = PAGE_SIZE;
> + }
> +
> + return st;
> +}
> +
> +static void intel_vgpu_gem_put_pages(struct drm_i915_gem_object *obj,
> + struct sg_table *pages)
> +{
> + struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> + struct intel_gvt *gvt = dev_priv->gvt;
> + int i;
> + struct intel_vgpu *vgpu;
> +
> + mutex_lock(&gvt->lock);
> + for_each_active_vgpu(gvt, vgpu, i) {
> + if (vgpu->obj_dmabuf == obj) {
> + kfree(vgpu->plane_info);
> + break;
> + }

We have a union for private data in i915_gem_object that you can use to
store the backpointer.

> + }
> + mutex_unlock(&gvt->lock);
> +
> + sg_free_table(pages);
> + kfree(pages);
> +
> + i915_gem_object_unpin_pages(obj);

That's confused and should be triggering at least 2 WARNs when the
object is freed.

> +}
> +
> +static const struct drm_i915_gem_object_ops intel_vgpu_gem_ops = {
> + .get_pages = intel_vgpu_gem_get_pages,
> + .put_pages = intel_vgpu_gem_put_pages,
> +};
> +
> +static struct drm_i915_gem_object *intel_vgpu_create_gem(struct drm_device 
> *dev,
> + struct intel_vgpu *vgpu)
> +{
> + struct drm_i915_private *pri = dev->dev_private;
> + struct drm_i915_gem_object *obj;
> + struct intel_vgpu_plane_info *info = vgpu->plane_info;
> +
> + obj = i915_gem_object_alloc(pri);
> + if (obj == NULL)
> + return NULL;
> +
> + drm_gem_private_object_init(dev, &obj->base,
> + info->size << PAGE_SHIFT);
> + i915_gem_object_init(obj, &intel_vgpu_gem_ops);
> +
> + obj->base.read_domains = I915_GEM_DOMAIN_GTT;
> + obj->base.write_domain = 0;
> +
> + if (IS_SKYLAKE(pri)) {
> + unsigned int tiling_mode = 0;
> +
> + switch (info->tiled << 10) {
> + case PLANE_CTL_TILED_LINEAR:
> + tiling_mode = I915_TILING_NONE;
> + break;
> + case PLANE_CTL_TILED_X:
> + tiling_mode = I915_TILING_X;
> + break;
> + case PLANE_CTL_TILED_Y:
> + tiling_mode = I915_TILING_Y;
> + break;
> + default:
> + gvt_vgpu_err("tile %d not supported\n", info->tiled);
> + }
> + obj->tiling_and_stride = tiling_mode | info->stride;
> + } else {
> + obj->tiling_and_stride = info->tiled ? I915_TILING_X : 0;
> + }

How good a first class object is this? If I understand this correctly,
we have a valid DMA mapping for the object, so we can access it via GTT
mmaps and rendering on the GPU. We just can't use CPU mmaps.

However, we should reject things like tiling changes, cache level
changes, set-domain is also problematic.

> +
> + return obj;
> +}

> +int intel_vgpu_create_dmabuf(struct intel_vgpu *vgpu, void *args)
> +{
> + struct dma_buf *dmabuf;
> + struct drm_i915_gem_object *obj;
> + struct drm_device *dev = &vgpu->gvt->dev_priv->drm;
> + struct intel_vgpu_dmabuf *gvt_dmabuf = args;
> + struct intel_vgpu_plane_info *info;
> + int ret;
> +
> + info = intel_vgpu_get_plane_info(dev, vg

Re: [Intel-gfx] Call trace on 4.12.0-rc1

2017-05-18 Thread Hans de Goede

Hi,

On 17-05-17 11:36, Ville Syrjälä wrote:

On Tue, May 16, 2017 at 10:43:39PM +0200, Hans de Goede wrote:

Hi,

On 05/16/2017 09:55 PM, FKr wrote:

Hi,
I'm using 4.12.0-rc1 from https://github.com/jwrdegoede/linux-sunxi and got
the following weird trace yesterday. Previously I've been getting output
similar to https://www.spinics.net/lists/intel-gfx/msg127638.html, some boots
on 4.12.0-rc1  I don't get any trace at all.


This is really weird, we are getting the error while we are trying to
acquire the wakelock ... ? Or do we need some other lock before we can
take the wakelock ?


You would also need the runtime pm reference. IIRC we unregister the
notifier in runtime suspend, so I think intel_runtime_pm_get_noresume()
should be OK in this case. Imre?


Thank you for the reply and you're right about not registering the
notifier on runtime-resume I will send a fix for that.

Now about the oops this thread is about, I believe this is triggered
by a race condition, intel_runtime_pm_put() does:

atomic_dec(&dev_priv->pm.wakeref_count);

pm_runtime_mark_last_busy(kdev);
pm_runtime_put_autosuspend(kdev);

And pm_runtime_put_autosuspend() calls intel_runtime_suspend() from
a workqueue, so there is ample of time between the atomic_dec and
intel_runtime_suspend() unregistering the notifier for this oops to
happen. So this really is a false-positive.

Which I believe you already realized and is why you suggested using
intel_runtime_pm_get_noresume().

It took me a while to realize that this is a false positive and all we
really want to do is silence the message and that we don't really need
to take a runtime_pm reference, because we unregister the notifier
(yes I wrote those bits myself, but I've done a lot of other stuff
since).

Unfortunately calling intel_runtime_pm_get_noresume() as you suggest
will not work because it contains a assert_rpm_wakelock_held()
call itself.

So I believe that the only way to silence the false-postive WARN_ON
would be for the notifier to call disable_rpm_wakeref_asserts()
and enable_rpm_wakeref_asserts() around the
intel_uncore_forcewake_get(FORCEWAKE_ALL) call, would that be an
acceptable fix ?

Note that we would then still have the
assert_rpm_device_not_suspended() check implied by
assert_rpm_wakelock_held() to ensure that we're
not calling intel_uncore_forcewake_get during suspend.

Regards,

Hans




[ 2383.844192] perf: interrupt took too long (2522 > 2500), lowering
kernel.perf_event_max_sample_rate to 79200
[ 2634.863978] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on
pipe A (start=157909 end=157910) time 322 us, min 1073, max 1079, scanline
start 1063, end 1084
[ 2647.881794] perf: interrupt took too long (3193 > 3152), lowering
kernel.perf_event_max_sample_rate to 62400
[ 3297.857921] perf: interrupt took too long (4020 > 3991), lowering
kernel.perf_event_max_sample_rate to 49500
[ 4670.977136] mmc0: Tuning timeout, falling back to fixed sampling clock
[ 4671.436604] mmc0: Tuning timeout, falling back to fixed sampling clock
[ 4680.756302] mmc0: Tuning timeout, falling back to fixed sampling clock
[ 4707.846872] perf: interrupt took too long (5046 > 5025), lowering
kernel.perf_event_max_sample_rate to 39600
[ 4846.672969] RPM wakelock ref not held during HW access
[ 4846.673050] [ cut here ]
[ 4846.673084] WARNING: CPU: 0 PID: 5227 at drivers/gpu/drm/i915/intel_drv.h:
1780 intel_uncore_forcewake_get+0xa0/0xb0
[ 4846.673088] Modules linked in: snd_soc_sst_cht_bsw_nau8824 btusb btintel
bluetooth axp288_fuel_gauge ecdh_generic axp288_charger extcon_axp288
axp288_adc snd_hdmi_lpe_audio snd_intel_sst_acpi extcon_intel_int3496
snd_intel_sst_core extcon_core snd_soc_nau8824 snd_soc_sst_atom_hifi2_platform
snd_soc_core snd_compress snd_soc_sst_match snd_pcm snd_timer kxcjk_1013
industrialio_triggered_buffer intel_cht_int33fe snd soundcore intel_int0002
[ 4846.673201] CPU: 0 PID: 5227 Comm: kworker/0:1 Not tainted 4.12.0-rc1+ #3
[ 4846.673206] Hardware name: MEDION E2228T MD60250/NT16H, BIOS 5.11
02/27/2017
[ 4846.673224] Workqueue: events fuel_gauge_status_monitor [axp288_fuel_gauge]
[ 4846.673235] task: 8800a85be800 task.stack: c9000261
[ 4846.673248] RIP: 0010:intel_uncore_forcewake_get+0xa0/0xb0
[ 4846.673255] RSP: 0018:c90002613aa0 EFLAGS: 00010286
[ 4846.673265] RAX: 002a RBX: 880136cc8000 RCX:
82063e88
[ 4846.673272] RDX:  RSI: 0082 RDI:
0247
[ 4846.673278] RBP: c90002613ac0 R08: 002a R09:
02ac
[ 4846.673284] R10: 0001 R11:  R12:
0007
[ 4846.673289] R13: 0001 R14:  R15:

[ 4846.673298] FS:  () GS:88013fc0() knlGS:

[ 4846.673305] CS:  0010 DS:  ES:  CR0: 80050033
[ 4846.673311] CR2: 0758c709c000 CR3: 02009000 CR4:
001006f0
[ 4846.673317] Ca

[Intel-gfx] [PATCH 0/2] Remaining patches for WM cleanup series

2017-05-18 Thread Mahesh Kumar
This series contains remaining two patches from wm cleanup series
https://patchwork.freedesktop.org/series/20152/

Initial 10 patches already got merged in tree so sending remaining 2
separately.

Kumar, Mahesh (2):
  drm/i915/skl: New ddb allocation algorithm
  drm/i915/skl+: consider max supported plane pixel rate while scaling

 drivers/gpu/drm/i915/intel_display.c |   3 +
 drivers/gpu/drm/i915/intel_drv.h |   2 +
 drivers/gpu/drm/i915/intel_pm.c  | 343 +--
 3 files changed, 248 insertions(+), 100 deletions(-)

-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/2] drm/i915/skl+: consider max supported plane pixel rate while scaling

2017-05-18 Thread Mahesh Kumar
From: "Kumar, Mahesh" 

A display resolution is only supported if it meets all the restrictions
below for Maximum Pipe Pixel Rate.

The display resolution must fit within the maximum pixel rate output
from the pipe. Make sure that the display pipe is able to feed pixels at
a rate required to support the desired resolution.
For each enabled plane on the pipe {
If plane scaling enabled {
Horizontal down scale amount = Maximum[1, plane horizontal size /
scaler horizontal window size]
Vertical down scale amount = Maximum[1, plane vertical size /
scaler vertical window size]
Plane down scale amount = Horizontal down scale amount *
Vertical down scale amount
Plane Ratio = 1 / Plane down scale amount
}
Else {
Plane Ratio = 1
}
If plane source pixel format is 64 bits per pixel {
Plane Ratio = Plane Ratio * 8/9
}
}

Pipe Ratio = Minimum Plane Ratio of all enabled planes on the pipe

If pipe scaling is enabled {
Horizontal down scale amount = Maximum[1, pipe horizontal source size /
scaler horizontal window size]
Vertical down scale amount = Maximum[1, pipe vertical source size /
scaler vertical window size]
Note: The progressive fetch - interlace display mode is equivalent to a
2.0 vertical down scale
Pipe down scale amount = Horizontal down scale amount *
Vertical down scale amount
Pipe Ratio = Pipe Ratio / Pipe down scale amount
}

Pipe maximum pixel rate = CDCLK frequency * Pipe Ratio

In this patch our calculation is based on pipe downscale amount
(plane max downscale amount * pipe downscale amount) instead of Pipe
Ratio. So,
max supported crtc clock with given scaling = CDCLK / pipe downscale.
Flip will fail if,
current crtc clock > max supported crct clock with given scaling.

Changes since V1:
 - separate out fixed_16_16 wrapper API definition
Changes since V2:
 - Fix buggy crtc !active condition (Maarten)
 - use intel_wm_plane_visible wrapper as per Maarten's suggestion
Changes since V3:
 - Change failure return from ERANGE to EINVAL
Changes since V4:
 - Rebase based on previous patch changes
Changes since V5:
 - return EINVAL instead of continue (Maarten)
Changes since V6:
 - Improve commit message
 - Address review comment

Signed-off-by: Mahesh Kumar 
Reviewed-by: Matt Roper 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_display.c |  3 ++
 drivers/gpu/drm/i915/intel_drv.h |  2 +
 drivers/gpu/drm/i915/intel_pm.c  | 87 
 3 files changed, 92 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 8217ed0e7132..df0b3e77129b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11177,6 +11177,9 @@ static int intel_crtc_atomic_check(struct drm_crtc 
*crtc,
ret = skl_update_scaler_crtc(pipe_config);
 
if (!ret)
+   ret = skl_check_pipe_max_pixel_rate(intel_crtc,
+   pipe_config);
+   if (!ret)
ret = intel_atomic_setup_scalers(dev_priv, intel_crtc,
 pipe_config);
}
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index bd500977b3fc..93afac4a83fa 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1885,6 +1885,8 @@ bool skl_ddb_allocation_overlaps(const struct 
skl_ddb_entry **entries,
 int ignore);
 bool ilk_disable_lp_wm(struct drm_device *dev);
 int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
+int skl_check_pipe_max_pixel_rate(struct intel_crtc *intel_crtc,
+ struct intel_crtc_state *cstate);
 static inline int intel_enable_rc6(void)
 {
return i915.enable_rc6;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index fb4cec3fb92c..4edc5f4ee598 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3863,6 +3863,93 @@ skl_plane_downscale_amount(const struct intel_crtc_state 
*cstate,
return mul_fixed16(downscale_w, downscale_h);
 }
 
+static uint_fixed_16_16_t
+skl_pipe_downscale_amount(const struct intel_crtc_state *config)
+{
+   uint_fixed_16_16_t pipe_downscale = u32_to_fixed_16_16(1);
+
+   if (!config->base.active)
+   return pipe_downscale;
+
+   if (config->pch_pfit.enabled) {
+   uint32_t src_w, src_h, dst_w, dst_h;
+   uint32_t pfit_size = config->pch_pfit.size;
+   uint_fixed_16_16_t fp_w_ratio, fp_h_ratio;
+   uint_fixed_16_16_t downscale_h, downscale_w;
+
+   src_w = config->pipe_src_w;
+   sr

[Intel-gfx] [PATCH 1/2] drm/i915/skl: New ddb allocation algorithm

2017-05-18 Thread Mahesh Kumar
From: "Kumar, Mahesh" 

This patch implements new DDB allocation algorithm as per HW team
recommendation. This algo takecare of scenario where we allocate less DDB
for the planes with lower relative pixel rate, but they require more DDB
to work.
It also takes care of enabling same watermark level for each
plane in crtc, for efficient power saving.

Changes since v1:
 - Rebase on top of Paulo's patch series

Changes since v2:
 - Fix the for loop condition to enable WM

Changes since v3:
 - Fix crash in cursor i-g-t reported by Maarten
 - Rebase after addressing Paulo's comments
 - Few other ULT fixes
Changes since v4:
 - Rebase on drm-tip
 - Added separate function to enable WM levels
Changes since v5:
 - Fix a crash identified in skl-6770HQ system
Changes since v6:
 - Address review comments from Matt
Changes since v7:
 - Fix failure return in skl_compute_plane_wm (Matt)
 - fix typo

Signed-off-by: Mahesh Kumar 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_pm.c | 256 
 1 file changed, 156 insertions(+), 100 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 936eef1634c7..fb4cec3fb92c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4039,13 +4039,41 @@ skl_ddb_calc_min(const struct intel_crtc_state *cstate, 
int num_active,
minimum[PLANE_CURSOR] = skl_cursor_allocation(num_active);
 }
 
+static void
+skl_enable_plane_wm_levels(const struct drm_i915_private *dev_priv,
+  uint16_t plane_ddb,
+  uint16_t max_level,
+  struct skl_plane_wm *wm)
+{
+   int level;
+   /*
+* Now enable all levels in WM structure which can be enabled
+* using current DDB allocation
+*/
+   for (level = ilk_wm_max_level(dev_priv); level >= 0; level--) {
+   struct skl_wm_level *level_wm = &wm->wm[level];
+
+   if (level > max_level || level_wm->plane_res_b == 0
+ || level_wm->plane_res_l >= 31
+ || level_wm->plane_res_b >= plane_ddb) {
+   level_wm->plane_en = false;
+   level_wm->plane_res_b = 0;
+   level_wm->plane_res_l = 0;
+   } else {
+   level_wm->plane_en = true;
+   }
+   }
+}
+
 static int
 skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
+ struct skl_pipe_wm *pipe_wm,
  struct skl_ddb_allocation *ddb /* out */)
 {
struct drm_atomic_state *state = cstate->base.state;
struct drm_crtc *crtc = cstate->base.crtc;
struct drm_device *dev = crtc->dev;
+   struct drm_i915_private *dev_priv = to_i915(dev);
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
enum pipe pipe = intel_crtc->pipe;
struct skl_ddb_entry *alloc = &cstate->wm.skl.ddb;
@@ -4058,6 +4086,8 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
unsigned plane_data_rate[I915_MAX_PLANES] = {};
unsigned plane_y_data_rate[I915_MAX_PLANES] = {};
uint16_t total_min_blocks = 0;
+   uint16_t total_level_ddb;
+   int max_level, level;
 
/* Clear the partitioning for disabled planes. */
memset(ddb->plane[pipe], 0, sizeof(ddb->plane[pipe]));
@@ -4096,10 +4126,48 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
return -EINVAL;
}
 
-   alloc_size -= total_min_blocks;
-   ddb->plane[pipe][PLANE_CURSOR].start = alloc->end - 
minimum[PLANE_CURSOR];
+   alloc_size -= minimum[PLANE_CURSOR];
+   ddb->plane[pipe][PLANE_CURSOR].start = alloc->end -
+   minimum[PLANE_CURSOR];
ddb->plane[pipe][PLANE_CURSOR].end = alloc->end;
 
+   for (level = ilk_wm_max_level(dev_priv); level >= 0; level--) {
+   total_level_ddb = 0;
+   for_each_plane_id_on_crtc(intel_crtc, plane_id) {
+   /*
+* TODO: We should calculate watermark values for Y/UV
+* plane both in case of NV12 format and use both values
+* for ddb calculation. NV12 is disabled as of now, So
+* using only single/UV plane value here.
+*/
+   struct skl_plane_wm *wm = &pipe_wm->planes[plane_id];
+   uint16_t plane_res_b = wm->wm[level].plane_res_b;
+   uint16_t min = minimum[plane_id] + y_minimum[plane_id];
+
+   if (plane_id == PLANE_CURSOR)
+   continue;
+
+   total_level_ddb += max(plane_res_b, min);
+   }
+
+   /*
+* If This level can successfully be enabled with the
+* pipe's c

Re: [Intel-gfx] [PATCH v3] drm: Add DRM_MODE_ROTATE_ and DRM_MODE_REFLECT_ to UAPI

2017-05-18 Thread Liviu Dudau
On Wed, May 17, 2017 at 09:39:11PM -0400, Robert Foss wrote:
> Add DRM_MODE_ROTATE_ and DRM_MODE_REFLECT_ defines to the UAPI
> as a convenience.
> 
> Ideally the DRM_ROTATE_ and DRM_REFLECT_ property ids are looked up
> through the atomic API, but realizing that userspace is likely to take
> shortcuts and assume that the enum values are what is sent over the
> wire.
> 
> As a result these defines are provided purely as a convenience to
> userspace applications.
> 
> Signed-off-by: Robert Foss 
> ---
> Changes since v2:
>  - Changed define prefix from DRM_MODE_PROP_ to DRM_MODE_
>  - Fix compilation errors
>  - Changed comment formatting
>  - Deduplicated comment lines
>  - Clarified DRM_MODE_PROP_REFLECT_ comment
> 
> Changes since v1:
>  - Moved defines from drm.h to drm_mode.h
>  - Changed define prefix from DRM_ to DRM_MODE_PROP_ 
>  - Updated uses of the defines to the new prefix
>  - Removed include from drm_rect.c
>  - Stopped using the BIT() macro 
> 
>  drivers/gpu/drm/arm/malidp_drv.h|  2 +-
>  drivers/gpu/drm/arm/malidp_planes.c | 18 -
>  drivers/gpu/drm/armada/armada_overlay.c |  2 +-
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 20 +-
>  drivers/gpu/drm/drm_atomic.c|  2 +-
>  drivers/gpu/drm/drm_atomic_helper.c |  2 +-
>  drivers/gpu/drm/drm_blend.c | 43 ++---
>  drivers/gpu/drm/drm_fb_helper.c |  4 +-
>  drivers/gpu/drm/drm_plane_helper.c  |  2 +-
>  drivers/gpu/drm/drm_rect.c  | 36 +-
>  drivers/gpu/drm/i915/i915_debugfs.c | 14 +++
>  drivers/gpu/drm/i915/intel_atomic_plane.c   |  6 +--
>  drivers/gpu/drm/i915/intel_display.c| 50 
> -
>  drivers/gpu/drm/i915/intel_fbc.c|  2 +-
>  drivers/gpu/drm/i915/intel_fbdev.c  |  2 +-
>  drivers/gpu/drm/i915/intel_sprite.c | 20 +-
>  drivers/gpu/drm/imx/ipuv3-plane.c   |  2 +-
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c   | 30 +++
>  drivers/gpu/drm/nouveau/nv50_display.c  |  2 +-
>  drivers/gpu/drm/omapdrm/omap_drv.c  |  4 +-
>  drivers/gpu/drm/omapdrm/omap_fb.c   | 18 -
>  drivers/gpu/drm/omapdrm/omap_plane.c| 16 
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c |  4 +-
>  include/drm/drm_blend.h | 21 +--
>  include/uapi/drm/drm_mode.h | 49 +++-
>  25 files changed, 201 insertions(+), 170 deletions(-)
> 
> diff --git a/drivers/gpu/drm/arm/malidp_drv.h 
> b/drivers/gpu/drm/arm/malidp_drv.h
> index 040311ffcaec..2e2033140efc 100644
> --- a/drivers/gpu/drm/arm/malidp_drv.h
> +++ b/drivers/gpu/drm/arm/malidp_drv.h
> @@ -65,6 +65,6 @@ void malidp_de_planes_destroy(struct drm_device *drm);
>  int malidp_crtc_init(struct drm_device *drm);
>  
>  /* often used combination of rotational bits */
> -#define MALIDP_ROTATED_MASK  (DRM_ROTATE_90 | DRM_ROTATE_270)
> +#define MALIDP_ROTATED_MASK  (DRM_MODE_ROTATE_90 | DRM_MODE_ROTATE_270)
>  
>  #endif  /* __MALIDP_DRV_H__ */
> diff --git a/drivers/gpu/drm/arm/malidp_planes.c 
> b/drivers/gpu/drm/arm/malidp_planes.c
> index 814fda23cead..063a8d2b0be3 100644
> --- a/drivers/gpu/drm/arm/malidp_planes.c
> +++ b/drivers/gpu/drm/arm/malidp_planes.c
> @@ -80,7 +80,7 @@ static void malidp_plane_reset(struct drm_plane *plane)
>   state = kzalloc(sizeof(*state), GFP_KERNEL);
>   if (state) {
>   state->base.plane = plane;
> - state->base.rotation = DRM_ROTATE_0;
> + state->base.rotation = DRM_MODE_ROTATE_0;
>   plane->state = &state->base;
>   }
>  }
> @@ -221,7 +221,7 @@ static int malidp_de_plane_check(struct drm_plane *plane,
>   return ret;
>  
>   /* packed RGB888 / BGR888 can't be rotated or flipped */
> - if (state->rotation != DRM_ROTATE_0 &&
> + if (state->rotation != DRM_MODE_ROTATE_0 &&
>   (fb->format->format == DRM_FORMAT_RGB888 ||
>fb->format->format == DRM_FORMAT_BGR888))
>   return -EINVAL;
> @@ -315,12 +315,12 @@ static void malidp_de_plane_update(struct drm_plane 
> *plane,
>   val &= ~LAYER_ROT_MASK;
>  
>   /* setup the rotation and axis flip bits */
> - if (plane->state->rotation & DRM_ROTATE_MASK)
> - val |= ilog2(plane->state->rotation & DRM_ROTATE_MASK) <<
> + if (plane->state->rotation & DRM_MODE_ROTATE_MASK)
> + val |= ilog2(plane->state->rotation & DRM_MODE_ROTATE_MASK) <<
>  LAYER_ROT_OFFSET;
> - if (plane->state->rotation & DRM_REFLECT_X)
> + if (plane->state->rotation & DRM_MODE_REFLECT_X)
>   val |= LAYER_H_FLIP;
> - if (plane->state->rotation & DRM_REFLECT_Y)
> + if (plane->state->rotation & DRM_MODE_REFLECT_Y)
>   val |= LAYER_V_FLI

Re: [Intel-gfx] [PATCH v5 3/3] drm/i915: Add format modifiers for Intel

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 02:14, Ben Widawsky  wrote:
> On 17-05-17 01:20:50, Emil Velikov wrote:
>>
>> Hi Ben,
>>
>> A couple of small questions/suggestions that I hope you find useful.
>> Please don't block any of this work based on my comments.
>>
>> On 16 May 2017 at 22:31, Ben Widawsky  wrote:
>>
>>> +static bool intel_primary_plane_format_mod_supported(struct drm_plane
>>> *plane,
>>> +uint32_t format,
>>> +uint64_t modifier)
>>> +{
>>> +   struct drm_i915_private *dev_priv = to_i915(plane->dev);
>>> +
>>> +   if (WARN_ON(modifier == DRM_FORMAT_MOD_INVALID))
>>> +   return false;
>>> +
>>> +   if (INTEL_GEN(dev_priv) >= 9)
>>> +   return skl_mod_supported(format, modifier);
>>> +   else if (INTEL_GEN(dev_priv) >= 4)
>>> +   return i965_mod_supported(format, modifier);
>>> +   else
>>> +   return i8xx_mod_supported(format, modifier);
>>> +
>>
>> Nit: if you rewrite this as below, the control flow should be clearer.
>> Thus the 'return false;' at the end, will be [more] obvious that it's
>> unreachable ;-)
>>
>>   if (INTEL_GEN(dev_priv) >= 9)
>>   return skl_mod_supported(format, modifier);
>>
>>   if (INTEL_GEN(dev_priv) >= 4)
>>   return i965_mod_supported(format, modifier);
>>
>>   return i8xx_mod_supported(format, modifier);
>>
>>
>
> It's a good point, however many other blocks of code do the same thing, and
> I
> think the nature of if/else if/else implies unreachable. I can add
> unreachable()... In fact, I just did.
>
>
The "implies" argument is a bit odd, but that's just bikesheding, so
feel free to ignore me.

>>> +   return false;
>>> +}
>>> +
>>
>>
>>
>>> --- a/drivers/gpu/drm/i915/intel_sprite.c
>>> +++ b/drivers/gpu/drm/i915/intel_sprite.c
>>
>>
>>> +static bool intel_sprite_plane_format_mod_supported(struct drm_plane
>>> *plane,
>>> +uint32_t format,
>>> +uint64_t modifier)
>>> +{
>>> +   struct drm_i915_private *dev_priv = to_i915(plane->dev);
>>> +
>>> +   if (WARN_ON(modifier == DRM_FORMAT_MOD_INVALID))
>>> +   return false;
>>> +
>>> +   BUG_ON(plane->base.type != DRM_PLANE_TYPE_OVERLAY);
>>> +
>>> +   switch (format) {
>>> +   case DRM_FORMAT_XBGR2101010:
>>> +   case DRM_FORMAT_ABGR2101010:
>>> +   if (IS_VALLEYVIEW(dev_priv) ||
>>> IS_CHERRYVIEW(dev_priv))
>>> +   return true;
>>> +   break;
>>> +   case DRM_FORMAT_RGB565:
>>> +   case DRM_FORMAT_ABGR:
>>> +   case DRM_FORMAT_ARGB:
>>> +   if (INTEL_GEN(dev_priv) >= 9 ||
>>> IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
>>> +   return true;
>>> +   break;
>>> +   case DRM_FORMAT_XBGR:
>>> +   if (INTEL_GEN(dev_priv) >= 6)
>>> +   return true;
>>> +   break;
>>> +   case DRM_FORMAT_XRGB:
>>> +   case DRM_FORMAT_YUYV:
>>> +   case DRM_FORMAT_YVYU:
>>> +   case DRM_FORMAT_UYVY:
>>> +   case DRM_FORMAT_VYUY:
>>> +   return true;
>>> +   }
>>> +
>>> +   return false;
>>> +}
>>> +
>>> +const struct drm_plane_funcs intel_sprite_plane_funcs = {
>>> +.update_plane = drm_atomic_helper_update_plane,
>>> +.disable_plane = drm_atomic_helper_disable_plane,
>>> +.destroy = intel_plane_destroy,
>>> +.set_property = drm_atomic_helper_plane_set_property,
>>> +.atomic_get_property = intel_plane_atomic_get_property,
>>> +.atomic_set_property = intel_plane_atomic_set_property,
>>> +.atomic_duplicate_state = intel_plane_duplicate_state,
>>> +.atomic_destroy_state = intel_plane_destroy_state,
>>> +.format_mod_supported = intel_sprite_plane_format_mod_supported,
>>> +};
>>> +
>>
>> Having a dull moment - is intel_sprite_plane_funcs (+
>> intel_sprite_plane_format_mod_supported) unused or I'm seeing things?
>> If one is to keep it, for future work, perhaps it's worth adding a 2-3
>> word comment,
>>
>> Regards,
>> Emil
>
>
> Not sure how this happened, it is a mistake.
I've spotted an actual bug alongside my bikeshedding, Yay!

Fwiw
Reviewed-by: Emil Velikov 

-Emil
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 4/5] drm/i915/gvt: Dmabuf support for GVT-g

2017-05-18 Thread Xiaoguang Chen
dmabuf for GVT-g can be exported to users who can use the dmabuf to show
the desktop of vm which use intel vgpu.

Currently we provide query and create new dmabuf operations.

Users of dmabuf can cache some created dmabufs and related information
such as the framebuffer's address, size, tiling mode, width, height etc.
When refresh the screen first query the currnet vgpu's frambuffer and
compare with the cached ones(address, size, tiling, width, height etc)
if found one then reuse the found dmabuf to gain performance improvment.

If there is no dmabuf created yet or not found in the cached dmabufs then
need to create a new dmabuf. To create a dmabuf first a gem object will
be created and the backing storage of this gem object is the vgpu's
framebuffer(primary/cursor). Then associate this gem object to a dmabuf
and export this dmabuf. A file descriptor will be generated for this
dmabuf and this file descriptor can be sent to user space to do display.

Signed-off-by: Xiaoguang Chen 
---
 drivers/gpu/drm/i915/gvt/Makefile |   2 +-
 drivers/gpu/drm/i915/gvt/dmabuf.c | 321 ++
 drivers/gpu/drm/i915/gvt/dmabuf.h |  44 ++
 drivers/gpu/drm/i915/gvt/gvt.h|   3 +
 include/uapi/drm/i915_drm.h   |  21 +++
 5 files changed, 390 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.c
 create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.h

diff --git a/drivers/gpu/drm/i915/gvt/Makefile 
b/drivers/gpu/drm/i915/gvt/Makefile
index 192ca26..e480f7d 100644
--- a/drivers/gpu/drm/i915/gvt/Makefile
+++ b/drivers/gpu/drm/i915/gvt/Makefile
@@ -2,7 +2,7 @@ GVT_DIR := gvt
 GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o firmware.o \
interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \
execlist.o scheduler.o sched_policy.o render.o cmd_parser.o \
-   fb_decoder.o
+   fb_decoder.o dmabuf.o
 
 ccflags-y  += -I$(src) -I$(src)/$(GVT_DIR) -Wall
 i915-y += $(addprefix $(GVT_DIR)/, 
$(GVT_SOURCE))
diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c 
b/drivers/gpu/drm/i915/gvt/dmabuf.c
new file mode 100644
index 000..3358e6f
--- /dev/null
+++ b/drivers/gpu/drm/i915/gvt/dmabuf.c
@@ -0,0 +1,321 @@
+/*
+ * Copyright 2017 Intel Corporation. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *Zhiyuan Lv 
+ *
+ * Contributors:
+ *Xiaoguang Chen 
+ */
+
+#include 
+#include 
+
+#include "i915_drv.h"
+#include "gvt.h"
+
+#define GEN8_DECODE_PTE(pte) \
+   ((dma_addr_t)(u64)pte) >> 12) & 0x7ffULL) << 12))
+
+static struct sg_table *intel_vgpu_gem_get_pages(
+   struct drm_i915_gem_object *obj)
+{
+   struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+   struct intel_gvt *gvt = dev_priv->gvt;
+   struct sg_table *st;
+   struct scatterlist *sg;
+   int i, ret;
+   gen8_pte_t __iomem *gtt_entries;
+   struct intel_vgpu *vgpu;
+   unsigned int fb_gma = 0, fb_size = 0;
+   bool found = false;
+
+   mutex_lock(&gvt->lock);
+   for_each_active_vgpu(gvt, vgpu, i) {
+   if (vgpu->obj_dmabuf == obj) {
+   fb_gma = vgpu->plane_info->start;
+   fb_size = vgpu->plane_info->size;
+   found = true;
+   break;
+   }
+   }
+   mutex_unlock(&gvt->lock);
+
+   if (!found) {
+   gvt_vgpu_err("no vgpu found\n");
+   return NULL;
+   }
+
+   st = kmalloc(sizeof(*st), GFP_KERNEL);
+   if (!st) {
+   ret = -ENOMEM;
+   return ERR_PTR(ret);
+   }
+
+   ret = sg_alloc_table(st, fb_size, GFP_KERNEL);
+   if (ret) {
+   kfree(st);
+   return ERR_PTR(ret);
+   }
+   gtt_entries = (gen8_pte_t __iom

[Intel-gfx] [PATCH v2 5/5] drm/i915/gvt: Adding interface so user space can get the dma-buf

2017-05-18 Thread Xiaoguang Chen
User space will try to create a management fd for the dma-buf operation.
Using this management fd user can query the plane information and create
a dma-buf fd if necessary.
GVT-g will handle the life cycle of the management fd and will align the
life cycle of the fd with the vfio device.
User space should handle the life cycle of the created dma-buf fd close
the dma-buf fd timely when finishing use.

Signed-off-by: Xiaoguang Chen 
---
 drivers/gpu/drm/i915/gvt/gvt.c   |  2 +
 drivers/gpu/drm/i915/gvt/gvt.h   |  3 ++
 drivers/gpu/drm/i915/gvt/kvmgt.c | 89 
 include/uapi/drm/i915_drm.h  |  2 +
 include/uapi/linux/vfio.h| 12 ++
 5 files changed, 108 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
index 2032917..48e04e6 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.c
+++ b/drivers/gpu/drm/i915/gvt/gvt.c
@@ -54,6 +54,8 @@
.vgpu_reset = intel_gvt_reset_vgpu,
.vgpu_activate = intel_gvt_activate_vgpu,
.vgpu_deactivate = intel_gvt_deactivate_vgpu,
+   .vgpu_query_dmabuf = intel_vgpu_query_dmabuf,
+   .vgpu_create_dmabuf = intel_vgpu_create_dmabuf,
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index a553120..b7fdfd5 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -185,6 +185,7 @@ struct intel_vgpu {
struct kvm *kvm;
struct work_struct release_work;
atomic_t released;
+   struct vfio_device *vfio_device;
} vdev;
 #endif
struct intel_vgpu_plane_info *plane_info;
@@ -469,6 +470,8 @@ struct intel_gvt_ops {
void (*vgpu_reset)(struct intel_vgpu *);
void (*vgpu_activate)(struct intel_vgpu *);
void (*vgpu_deactivate)(struct intel_vgpu *);
+   int (*vgpu_query_dmabuf)(struct intel_vgpu *, void *);
+   int (*vgpu_create_dmabuf)(struct intel_vgpu *, void *);
 };
 
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 389f072..9a663df 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i915_drv.h"
 #include "gvt.h"
@@ -524,6 +525,66 @@ static int intel_vgpu_reg_init_opregion(struct intel_vgpu 
*vgpu)
return ret;
 }
 
+static int intel_vgpu_dmabuf_mgr_fd_mmap(struct file *file,
+   struct vm_area_struct *vma)
+{
+   return -EPERM;
+}
+
+static int intel_vgpu_dmabuf_mgr_fd_release(struct inode *inode,
+   struct file *filp)
+{
+   struct intel_vgpu *vgpu = filp->private_data;
+
+   if (vgpu->vdev.vfio_device != NULL)
+   vfio_device_put(vgpu->vdev.vfio_device);
+   else
+   gvt_vgpu_err("intel vgpu dmabuf mgr fd is in a wrong state\n");
+
+   return 0;
+}
+
+static long intel_vgpu_dmabuf_mgr_fd_ioctl(struct file *filp,
+   unsigned int ioctl, unsigned long arg)
+{
+   struct intel_vgpu *vgpu = filp->private_data;
+   int minsz;
+   struct intel_vgpu_dmabuf dmabuf;
+   int ret;
+   struct fd f;
+
+   minsz = offsetofend(struct intel_vgpu_dmabuf, tiled);
+   if (copy_from_user(&dmabuf, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   f = fdget(dmabuf.fd);
+
+   if (ioctl == INTEL_VGPU_QUERY_DMABUF)
+   ret = intel_gvt_ops->vgpu_query_dmabuf(vgpu, &dmabuf);
+   else if (ioctl == INTEL_VGPU_GENERATE_DMABUF)
+   ret = intel_gvt_ops->vgpu_create_dmabuf(vgpu, &dmabuf);
+   else {
+   fdput(f);
+   gvt_vgpu_err("unsupported dmabuf operation\n");
+   return -EINVAL;
+   }
+
+   if (ret != 0) {
+   fdput(f);
+   gvt_vgpu_err("gvt-g get dmabuf failed:%d\n", ret);
+   return -EINVAL;
+   }
+   fdput(f);
+
+   return copy_to_user((void __user *)arg, &dmabuf, minsz) ? -EFAULT : 0;
+}
+
+static const struct file_operations intel_vgpu_dmabuf_mgr_fd_ops = {
+   .release= intel_vgpu_dmabuf_mgr_fd_release,
+   .unlocked_ioctl = intel_vgpu_dmabuf_mgr_fd_ioctl,
+   .mmap   = intel_vgpu_dmabuf_mgr_fd_mmap,
+   .llseek = noop_llseek,
+};
 static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct intel_vgpu *vgpu = NULL;
@@ -1259,6 +1320,34 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, 
unsigned int cmd,
} else if (cmd == VFIO_DEVICE_RESET) {
intel_gvt_ops->vgpu_reset(vgpu);
return 0;
+   } else if (cmd == VFIO_DEVICE_GET_FD) {
+   int fd;
+   u32 type;
+   struct vfio_device *device;
+
+   if (copy_from_user(&type, (void __user *)arg, sizeof(type)))
+   return -EINVAL;
+   if (type != INTEL_VGPU_DMABUF_MGR_FD) {
+  

[Intel-gfx] [PATCH v2 3/5] drm/i915/gvt: Frame buffer decoder support for GVT-g

2017-05-18 Thread Xiaoguang Chen
decode frambuffer attributes of primary, cursor and sprite plane

Signed-off-by: Xiaoguang Chen 
---
 drivers/gpu/drm/i915/gvt/Makefile |   3 +-
 drivers/gpu/drm/i915/gvt/display.c|   2 +-
 drivers/gpu/drm/i915/gvt/display.h|   2 +
 drivers/gpu/drm/i915/gvt/fb_decoder.c | 487 ++
 drivers/gpu/drm/i915/gvt/fb_decoder.h | 166 
 drivers/gpu/drm/i915/gvt/gvt.h|   1 +
 include/uapi/drm/i915_drm.h   |   6 +
 7 files changed, 665 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.c
 create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.h

diff --git a/drivers/gpu/drm/i915/gvt/Makefile 
b/drivers/gpu/drm/i915/gvt/Makefile
index b123c20..192ca26 100644
--- a/drivers/gpu/drm/i915/gvt/Makefile
+++ b/drivers/gpu/drm/i915/gvt/Makefile
@@ -1,7 +1,8 @@
 GVT_DIR := gvt
 GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o firmware.o \
interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \
-   execlist.o scheduler.o sched_policy.o render.o cmd_parser.o
+   execlist.o scheduler.o sched_policy.o render.o cmd_parser.o \
+   fb_decoder.o
 
 ccflags-y  += -I$(src) -I$(src)/$(GVT_DIR) -Wall
 i915-y += $(addprefix $(GVT_DIR)/, 
$(GVT_SOURCE))
diff --git a/drivers/gpu/drm/i915/gvt/display.c 
b/drivers/gpu/drm/i915/gvt/display.c
index e0261fc..f5f63c5 100644
--- a/drivers/gpu/drm/i915/gvt/display.c
+++ b/drivers/gpu/drm/i915/gvt/display.c
@@ -67,7 +67,7 @@ static int edp_pipe_is_enabled(struct intel_vgpu *vgpu)
return 1;
 }
 
-static int pipe_is_enabled(struct intel_vgpu *vgpu, int pipe)
+int pipe_is_enabled(struct intel_vgpu *vgpu, int pipe)
 {
struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 
diff --git a/drivers/gpu/drm/i915/gvt/display.h 
b/drivers/gpu/drm/i915/gvt/display.h
index d73de22..b46b868 100644
--- a/drivers/gpu/drm/i915/gvt/display.h
+++ b/drivers/gpu/drm/i915/gvt/display.h
@@ -179,4 +179,6 @@ static inline char *vgpu_edid_str(enum intel_vgpu_edid id)
 void intel_vgpu_reset_display(struct intel_vgpu *vgpu);
 void intel_vgpu_clean_display(struct intel_vgpu *vgpu);
 
+int pipe_is_enabled(struct intel_vgpu *vgpu, int pipe);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gvt/fb_decoder.c 
b/drivers/gpu/drm/i915/gvt/fb_decoder.c
new file mode 100644
index 000..954f047
--- /dev/null
+++ b/drivers/gpu/drm/i915/gvt/fb_decoder.c
@@ -0,0 +1,487 @@
+/*
+ * Copyright(c) 2011-2016 Intel Corporation. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *Kevin Tian 
+ *
+ * Contributors:
+ *Bing Niu 
+ *Xu Han 
+ *Ping Gao 
+ *Xiaoguang Chen 
+ *Yang Liu 
+ *
+ */
+
+#include 
+#include "i915_drv.h"
+#include "gvt.h"
+
+/* The below definitions are required by guest. */
+// [63:0] x:R:G:B 16:16:16:16 little endian
+#define DRM_FORMAT_XRGB161616_GVT  fourcc_code('X', 'R', '4', '8')
+// [63:0] x:B:G:R 16:16:16:16 little endian
+#define DRM_FORMAT_XBGR161616_GVT  fourcc_code('X', 'B', '4', '8')
+
+#define FORMAT_NUM 16
+struct pixel_format {
+   int drm_format; /* Pixel format in DRM definition */
+   int bpp;/* Bits per pixel, 0 indicates invalid */
+   char *desc; /* The description */
+};
+
+/* non-supported format has bpp default to 0 */
+static struct pixel_format primary_pixel_formats[FORMAT_NUM] = {
+   [0x2] = {DRM_FORMAT_C8, 8, "8-bit Indexed"},
+   [0x5] = {DRM_FORMAT_RGB565, 16, "16-bit BGRX (5:6:5 MSB-R:G:B)"},
+   [0x6] = {DRM_FORMAT_XRGB, 32,
+   "32-bit BGRX (8:8:8:8 MSB-X:R:G:B)"},
+   [0x8] = {DRM_FORMAT_XBGR2101010, 32,
+   "32-bit RGBX (2:10:10:10 MSB-X:B:G:R)"},
+   [0xa] = {DRM_FORMAT_XRGB2101010, 32,
+   "32-bit BGRX (2:10:10:10 MSB-X:R:G:B)"},
+   

[Intel-gfx] [PATCH v2 1/5] drm/i915/gvt: Extend the GVT-g architecture to support vfio device region

2017-05-18 Thread Xiaoguang Chen
Signed-off-by: Xiaoguang Chen 
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 1ae0b40..3c6a02b 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -53,11 +53,21 @@
 #define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
 #define VFIO_PCI_OFFSET_MASK(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
 
+struct vfio_region;
+struct intel_vgpu_regops {
+   size_t (*rw)(struct intel_vgpu *vgpu, char *buf,
+   size_t count, loff_t *ppos, bool iswrite);
+   void (*release)(struct intel_vgpu *vgpu,
+   struct vfio_region *region);
+};
+
 struct vfio_region {
u32 type;
u32 subtype;
size_t  size;
u32 flags;
+   const struct intel_vgpu_regops  *ops;
+   void*data;
 };
 
 struct kvmgt_pgfn {
@@ -642,7 +652,7 @@ static ssize_t intel_vgpu_rw(struct mdev_device *mdev, char 
*buf,
int ret = -EINVAL;
 
 
-   if (index >= VFIO_PCI_NUM_REGIONS) {
+   if (index >= VFIO_PCI_NUM_REGIONS + vgpu->vdev.num_regions) {
gvt_vgpu_err("invalid index: %u\n", index);
return -EINVAL;
}
@@ -676,8 +686,11 @@ static ssize_t intel_vgpu_rw(struct mdev_device *mdev, 
char *buf,
case VFIO_PCI_BAR5_REGION_INDEX:
case VFIO_PCI_VGA_REGION_INDEX:
case VFIO_PCI_ROM_REGION_INDEX:
+   break;
default:
-   gvt_vgpu_err("unsupported region: %u\n", index);
+   index -= VFIO_PCI_NUM_REGIONS;
+   return vgpu->vdev.region[index].ops->rw(vgpu, buf, count,
+   ppos, is_write);
}
 
return ret == 0 ? count : ret;
@@ -940,7 +953,8 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, 
unsigned int cmd,
 
info.flags = VFIO_DEVICE_FLAGS_PCI;
info.flags |= VFIO_DEVICE_FLAGS_RESET;
-   info.num_regions = VFIO_PCI_NUM_REGIONS;
+   info.num_regions = VFIO_PCI_NUM_REGIONS +
+   vgpu->vdev.num_regions;
info.num_irqs = VFIO_PCI_NUM_IRQS;
 
return copy_to_user((void __user *)arg, &info, minsz) ?
@@ -1061,6 +1075,7 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, 
unsigned int cmd,
}
 
if (caps.size) {
+   info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
if (info.argsz < sizeof(info) + caps.size) {
info.argsz = sizeof(info) + caps.size;
info.cap_offset = 0;
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 2/5] drm/i915/gvt: OpRegion support for GVT-g

2017-05-18 Thread Xiaoguang Chen
OpRegion is needed to support display related operation for
intel vgpu.

A vfio device region is added to intel vgpu to deliver the
host OpRegion information to user space so user space can
construct the OpRegion for vgpu.

Signed-off-by: Bing Niu 
Signed-off-by: Xiaoguang Chen 
---
 drivers/gpu/drm/i915/gvt/kvmgt.c| 97 +
 drivers/gpu/drm/i915/gvt/opregion.c | 12 -
 2 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 3c6a02b..389f072 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -53,6 +53,8 @@
 #define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
 #define VFIO_PCI_OFFSET_MASK(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
 
+#define OPREGION_SIGNATURE "IntelGraphicsMem"
+
 struct vfio_region;
 struct intel_vgpu_regops {
size_t (*rw)(struct intel_vgpu *vgpu, char *buf,
@@ -436,6 +438,92 @@ static void kvmgt_protect_table_del(struct 
kvmgt_guest_info *info,
}
 }
 
+static size_t intel_vgpu_reg_rw_opregion(struct intel_vgpu *vgpu, char *buf,
+   size_t count, loff_t *ppos, bool iswrite)
+{
+   unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) -
+   VFIO_PCI_NUM_REGIONS;
+   void *base = vgpu->vdev.region[i].data;
+   loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+   if (pos >= vgpu->vdev.region[i].size || iswrite) {
+   gvt_vgpu_err("invalid op or offset for Intel vgpu OpRegion\n");
+   return -EINVAL;
+   }
+   count = min(count, (size_t)(vgpu->vdev.region[i].size - pos));
+   memcpy(buf, base + pos, count);
+
+   return count;
+}
+
+static void intel_vgpu_reg_release_opregion(struct intel_vgpu *vgpu,
+   struct vfio_region *region)
+{
+   memunmap(region->data);
+}
+
+static const struct intel_vgpu_regops intel_vgpu_regops_opregion = {
+   .rw = intel_vgpu_reg_rw_opregion,
+   .release = intel_vgpu_reg_release_opregion,
+};
+
+static int intel_vgpu_register_reg(struct intel_vgpu *vgpu,
+   unsigned int type, unsigned int subtype,
+   const struct intel_vgpu_regops *ops,
+   size_t size, u32 flags, void *data)
+{
+   struct vfio_region *region;
+
+   region = krealloc(vgpu->vdev.region,
+   (vgpu->vdev.num_regions + 1) * sizeof(*region),
+   GFP_KERNEL);
+   if (!region)
+   return -ENOMEM;
+
+   vgpu->vdev.region = region;
+   vgpu->vdev.region[vgpu->vdev.num_regions].type = type;
+   vgpu->vdev.region[vgpu->vdev.num_regions].subtype = subtype;
+   vgpu->vdev.region[vgpu->vdev.num_regions].ops = ops;
+   vgpu->vdev.region[vgpu->vdev.num_regions].size = size;
+   vgpu->vdev.region[vgpu->vdev.num_regions].flags = flags;
+   vgpu->vdev.region[vgpu->vdev.num_regions].data = data;
+   vgpu->vdev.num_regions++;
+
+   return 0;
+}
+
+static int intel_vgpu_reg_init_opregion(struct intel_vgpu *vgpu)
+{
+   unsigned int addr;
+   void *base;
+   int ret;
+
+   addr = vgpu->gvt->opregion.opregion_pa;
+   if (!addr || !(~addr))
+   return -ENODEV;
+
+   base = memremap(addr, OPREGION_SIZE, MEMREMAP_WB);
+   if (!base)
+   return -ENOMEM;
+
+   if (memcmp(base, OPREGION_SIGNATURE, 16)) {
+   memunmap(base);
+   return -EINVAL;
+   }
+
+   ret = intel_vgpu_register_reg(vgpu,
+   PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
+   VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION,
+   &intel_vgpu_regops_opregion, OPREGION_SIZE,
+   VFIO_REGION_INFO_FLAG_READ, base);
+   if (ret) {
+   memunmap(base);
+   return ret;
+   }
+
+   return ret;
+}
+
 static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct intel_vgpu *vgpu = NULL;
@@ -467,6 +555,15 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
mdev_device *mdev)
vgpu->vdev.mdev = mdev;
mdev_set_drvdata(mdev, vgpu);
 
+   ret = intel_vgpu_reg_init_opregion(vgpu);
+   if (ret) {
+   gvt_vgpu_err("create OpRegion failed\n");
+   goto out;
+   }
+
+   gvt_dbg_core("create OpRegion succeeded for mdev:%s\n",
+   dev_name(mdev_dev(mdev)));
+
gvt_dbg_core("intel_vgpu_create succeeded for mdev: %s\n",
 dev_name(mdev_dev(mdev)));
ret = 0;
diff --git a/drivers/gpu/drm/i915/gvt/opregion.c 
b/drivers/gpu/drm/i915/gvt/opregion.c
index 3117991..99591bc 100644
--- a/drivers/gpu/drm/i915/gvt/opregion.c
+++ b/drivers/gpu/drm/i915/gvt/opregion.c
@@ -283,14 +283,22 @@ int intel_vgpu_emulate_opregion_request(struct intel_vgpu 
*vgpu, u32 swsci)
 {
u32 *scic, *parm;
u32 f

[Intel-gfx] [PATCH v2 0/5] drm/i915/gvt: Dma-buf support for GVT-g

2017-05-18 Thread Xiaoguang Chen
v2->v1:
1) create a management fd for dma-buf operations.
2) alloc gem object's backing storage in gem obj's get_pages() callback.

This patch set adds the dma-buf support for intel GVT-g.
dma-buf is a uniform mechanism to share DMA buffers across different
devices and sub-systems.
dma-buf for intel GVT-g is mainly used to share the vgpu's framebuffer
to other users or sub-systems so they can use the dma-buf to show the
desktop of a vm which uses intel vgpu.

The main idea is we create a gem object and set vgpu's framebuffer as
the backing storage of this gem object. And associate this gem obj
to a dma-buf object then export this dma-buf at the meantime
generate a file descriptor for this dma-buf. Finally deliver this file
descriptor to user space. And user can use this dma-buf fd to do render
or other operations.
User need to create a fd(for intel GVT-g dma-buf support it is a:dma-buf
management fd) then user can use this fd to query the plane information
or create a dma-buf. The life cycle of this fd is managed by GVT-g user
do not need to care about that.

We have an example program on how to use the dma-buf. You can download
the program to have a try. Good luck :)
git repo: https://github.com/01org/igvtg-qemu branch:kvmgt_dmabuf_example.

Xiaoguang Chen (5):
  drm/i915/gvt: Extend the GVT-g architecture to support vfio device
region
  drm/i915/gvt: OpRegion support for GVT-g
  drm/i915/gvt: Frame buffer decoder support for GVT-g
  drm/i915/gvt: Dmabuf support for GVT-g
  drm/i915/gvt: Adding interface so user space can get the dma-buf

 drivers/gpu/drm/i915/gvt/Makefile |   3 +-
 drivers/gpu/drm/i915/gvt/display.c|   2 +-
 drivers/gpu/drm/i915/gvt/display.h|   2 +
 drivers/gpu/drm/i915/gvt/dmabuf.c | 321 ++
 drivers/gpu/drm/i915/gvt/dmabuf.h |  44 +++
 drivers/gpu/drm/i915/gvt/fb_decoder.c | 487 ++
 drivers/gpu/drm/i915/gvt/fb_decoder.h | 166 
 drivers/gpu/drm/i915/gvt/gvt.c|   2 +
 drivers/gpu/drm/i915/gvt/gvt.h|   7 +
 drivers/gpu/drm/i915/gvt/kvmgt.c  | 207 ++-
 drivers/gpu/drm/i915/gvt/opregion.c   |  12 +-
 include/uapi/drm/i915_drm.h   |  29 ++
 include/uapi/linux/vfio.h |  12 +
 13 files changed, 1287 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.c
 create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.h
 create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.c
 create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.h

-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm: Add DRM_MODE_ROTATE_ and DRM_MODE_REFLECT_ to UAPI

2017-05-18 Thread Ville Syrjälä
On Wed, May 17, 2017 at 09:39:11PM -0400, Robert Foss wrote:
> +/*
> + * DRM_MODE_REFLECT_
> + *
> + * Signals that the contents of a drm plane has been reflected in
> + * the  axis.

Still vague.

Also you didn't respond to my comment about the use of past tense.

> + *
> + * This define is provided as a convenience, looking up the property id
> + * using the name->prop id lookup is the preferred method.
> + */
> +#define DRM_MODE_REFLECT_X  (1<<4)
> +#define DRM_MODE_REFLECT_Y  (1<<5)
> +
> +/*
> + * DRM_MODE_REFLECT_MASK
> + *
> + * Bitmask used to look for drm plane reflections.
> + */
> +#define DRM_MODE_REFLECT_MASK (\
> + DRM_MODE_REFLECT_X | \
> + DRM_MODE_REFLECT_Y)
> +
> +
>  struct drm_mode_modeinfo {
>   __u32 clock;
>   __u16 hdisplay;
> -- 
> 2.11.0.453.g787f75f05

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 2/3] drm: Create a format/modifier blob

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 01:46, Ben Widawsky  wrote:

>>> +   blob_size += modifiers_size;
>>> +
>>> +   blob = drm_property_create_blob(dev, blob_size, NULL);
>>> +   if (IS_ERR(blob))
>>> +   return -1;
>>> +
>>
>> Maybe propagate the exact error... Hmm we don't seem to check if
>> create_in_format_blob fails either so perhaps it's not worth it.
>>
>>
>
> In this case it's almost definitely ENOMEM. All values should be verified -
> I
> think the other errors are there for when userspace is utilizing blob
> creation.
>
> So I'll just leave it.
>
Ack, sure thing.

>>> --- a/include/uapi/drm/drm_mode.h
>>> +++ b/include/uapi/drm/drm_mode.h
>>> @@ -665,6 +665,56 @@ struct drm_mode_atomic {
>>> __u64 user_data;
>>>  };
>>>
>>> +struct drm_format_modifier_blob {
>>> +#define FORMAT_BLOB_CURRENT 1
>>> +   /* Version of this blob format */
>>> +   u32 version;
>>> +
>>> +   /* Flags */
>>> +   u32 flags;
>>> +
>>> +   /* Number of fourcc formats supported */
>>> +   u32 count_formats;
>>> +
>>> +   /* Where in this blob the formats exist (in bytes) */
>>> +   u32 formats_offset;
>>> +
>>> +   /* Number of drm_format_modifiers */
>>> +   u32 count_modifiers;
>>> +
>>> +   /* Where in this blob the modifiers exist (in bytes) */
>>> +   u32 modifiers_offset;
>>> +
>>> +   /* u32 formats[] */
>>> +   /* struct drm_format_modifier modifiers[] */
>>> +};
>>> +
>>> +struct drm_format_modifier {
>>> +   /* Bitmask of formats in get_plane format list this info applies
>>> to. The
>>> +* offset allows a sliding window of which 64 formats (bits).
>>> +*
>>> +* Some examples:
>>> +* In today's world with < 65 formats, and formats 0, and 2 are
>>> +* supported
>>> +* 0x0005
>>> +*^-offset = 0, formats = 5
>>> +*
>>> +* If the number formats grew to 128, and formats 98-102 are
>>> +* supported with the modifier:
>
> G>> +*
>>>
>>> +* 0x003c 
>>> +*^
>>> +*|__offset = 64, formats = 0x3c
>>> +*
>>> +*/
>>> +   __u64 formats;
>>> +   __u32 offset;
>>> +   __u32 pad;
>>> +
>>> +   /* The modifier that applies to the >get_plane format list
>>> bitmask. */
>>> +   __u64 modifier;
>>
>> Please drop the leading __ from the type names in UAPI headers.
>>
>
> Many other structures have the __, can you explain why please (this has
> probably
> been beaten to death already; but I don't know)?
>
Got confused there for a moment, apologies.

Using the __ types is correct. It is drm_format_modifier_blob that
should be updated to follow suit.

-Emil
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 16/24] drm/i915: Async GPU relocation processing

2017-05-18 Thread Chris Wilson
If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.

v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem.c|   1 -
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 227 -
 2 files changed, 220 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 827dfe7b6937..5dcd51655a81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4373,7 +4373,6 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
GEM_BUG_ON(i915_gem_object_is_active(obj));
list_for_each_entry_safe(vma, vn,
 &obj->vma_list, obj_link) {
-   GEM_BUG_ON(!i915_vma_is_ggtt(vma));
GEM_BUG_ON(i915_vma_is_active(vma));
vma->flags &= ~I915_VMA_PIN_MASK;
i915_vma_close(vma);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bc2bb8b19bbf..9715099664ac 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -40,7 +40,12 @@
 #include "intel_drv.h"
 #include "intel_frontbuffer.h"
 
-#define DBG_USE_CPU_RELOC 0 /* -1 force GTT relocs; 1 force CPU relocs */
+enum {
+   FORCE_CPU_RELOC = 1,
+   FORCE_GTT_RELOC,
+   FORCE_GPU_RELOC,
+#define DBG_FORCE_RELOC 0 /* choose one of the above! */
+};
 
 #define  __EXEC_OBJECT_HAS_REF BIT(31)
 #define  __EXEC_OBJECT_HAS_PIN BIT(30)
@@ -212,10 +217,15 @@ struct i915_execbuffer {
struct drm_mm_node node; /** temporary GTT binding */
unsigned long vaddr; /** Current kmap address */
unsigned long page; /** Currently mapped page index */
+   unsigned int gen; /** Cached value of INTEL_GEN */
bool use_64bit_reloc : 1;
bool has_llc : 1;
bool has_fence : 1;
bool needs_unfenced : 1;
+
+   struct drm_i915_gem_request *rq;
+   u32 *rq_cmd;
+   unsigned int rq_size;
} reloc_cache;
 
u64 invalid_flags; /** Set of execobj.flags that are invalid */
@@ -498,8 +508,11 @@ static inline int use_cpu_reloc(const struct reloc_cache 
*cache,
if (!i915_gem_object_has_struct_page(obj))
return false;
 
-   if (DBG_USE_CPU_RELOC)
-   return DBG_USE_CPU_RELOC > 0;
+   if (DBG_FORCE_RELOC == FORCE_CPU_RELOC)
+   return true;
+
+   if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
+   return false;
 
return (cache->has_llc ||
obj->cache_dirty ||
@@ -902,6 +915,8 @@ static void eb_release_vmas(const struct i915_execbuffer 
*eb)
 
 static void eb_destroy(const struct i915_execbuffer *eb)
 {
+   GEM_BUG_ON(eb->reloc_cache.rq);
+
if (eb->lut_size >= 0)
kfree(eb->buckets);
 }
@@ -919,11 +934,14 @@ static void reloc_cache_init(struct reloc_cache *cache,
cache->page = -1;
cache->vaddr = 0;
/* Must be a variable in the struct to allow GCC to unroll. */
+   cache->gen = INTEL_GEN(i915);
cache->has_llc = HAS_LLC(i915);
-   cache->has_fence = INTEL_GEN(i915) < 4;
-   cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
cache->use_64bit_reloc = HAS_64BIT_RELOC(i915);
+   cache->has_fence = cache->gen < 4;
+   cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
cache->node.allocated = false;
+   cache->rq = NULL;
+   cache->rq_size = 0;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -945,10 +963,24 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
reloc_cache *cache)
return &i915->ggtt;
 }
 
+static void reloc_gpu_flush(struct reloc_cache *cache)
+{
+ 

[Intel-gfx] [PATCH 21/24] drm/i915: Allow contexts to be unreferenced locklessly

2017-05-18 Thread Chris Wilson
If we move the actual cleanup of the context to a worker, we can allow
the final free to be called from any context and avoid undue latency in
the caller.

v2: Negotiate handling the delayed contexts free by flushing the
workqueue before calling i915_gem_context_fini() and performing the final
free of the kernel context directly
v3: Flush deferred frees before new context allocations

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/gvt/scheduler.c |  2 +-
 drivers/gpu/drm/i915/i915_drv.c  |  2 +
 drivers/gpu/drm/i915/i915_drv.h  | 23 +
 drivers/gpu/drm/i915/i915_gem_context.c  | 59 +++-
 drivers/gpu/drm/i915/i915_gem_context.h  | 15 +-
 drivers/gpu/drm/i915/i915_perf.c |  4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c|  8 +++-
 drivers/gpu/drm/i915/selftests/mock_context.c|  9 
 drivers/gpu/drm/i915/selftests/mock_context.h|  2 +
 drivers/gpu/drm/i915/selftests/mock_gem_device.c |  3 +-
 10 files changed, 88 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c 
b/drivers/gpu/drm/i915/gvt/scheduler.c
index 6ae286cb5804..84c2495df372 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -595,7 +595,7 @@ int intel_gvt_init_workload_scheduler(struct intel_gvt *gvt)
 
 void intel_vgpu_clean_gvt_context(struct intel_vgpu *vgpu)
 {
-   i915_gem_context_put_unlocked(vgpu->shadow_ctx);
+   i915_gem_context_put(vgpu->shadow_ctx);
 }
 
 int intel_vgpu_init_gvt_context(struct intel_vgpu *vgpu)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6c91bc9f7e2b..2d2fb4327f97 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -550,6 +550,8 @@ static const struct vga_switcheroo_client_ops 
i915_switcheroo_ops = {
 
 static void i915_gem_fini(struct drm_i915_private *dev_priv)
 {
+   flush_workqueue(dev_priv->wq);
+
mutex_lock(&dev_priv->drm.struct_mutex);
intel_uc_fini_hw(dev_priv);
i915_gem_cleanup_engines(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 61a0c81164cc..f11c06bd977e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2313,6 +2313,8 @@ struct drm_i915_private {
 
struct {
struct list_head list;
+   struct llist_head free_list;
+   struct work_struct free_work;
 
/* The hw wants to have a stable context identifier for the
 * lifetime of the context (for OA, PASID, faults, etc).
@@ -3497,27 +3499,6 @@ i915_gem_context_lookup(struct drm_i915_file_private 
*file_priv, u32 id)
return ctx;
 }
 
-static inline struct i915_gem_context *
-i915_gem_context_get(struct i915_gem_context *ctx)
-{
-   kref_get(&ctx->ref);
-   return ctx;
-}
-
-static inline void i915_gem_context_put(struct i915_gem_context *ctx)
-{
-   lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-   kref_put(&ctx->ref, i915_gem_context_free);
-}
-
-static inline void i915_gem_context_put_unlocked(struct i915_gem_context *ctx)
-{
-   struct mutex *lock = &ctx->i915->drm.struct_mutex;
-
-   if (kref_put_mutex(&ctx->ref, i915_gem_context_free, lock))
-   mutex_unlock(lock);
-}
-
 static inline struct intel_timeline *
 i915_gem_context_lookup_timeline(struct i915_gem_context *ctx,
 struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index f4395a1e214d..2ad78bb50137 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -158,13 +158,11 @@ static void vma_lut_free(struct i915_gem_context *ctx)
kvfree(lut->ht);
 }
 
-void i915_gem_context_free(struct kref *ctx_ref)
+static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
-   struct i915_gem_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
int i;
 
lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-   trace_i915_context_free(ctx);
GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
vma_lut_free(ctx);
@@ -192,6 +190,37 @@ void i915_gem_context_free(struct kref *ctx_ref)
kfree(ctx);
 }
 
+static void contexts_free(struct drm_i915_private *i915)
+{
+   struct llist_node *freed = llist_del_all(&i915->contexts.free_list);
+   struct i915_gem_context *ctx;
+
+   lockdep_assert_held(&i915->drm.struct_mutex);
+
+   llist_for_each_entry(ctx, freed, free_link)
+   i915_gem_context_free(ctx);
+}
+
+static void contexts_free_worker(struct work_struct *work)
+{
+   struct drm_i915_private *i915 =
+   container_of(work, typeof(*i915), contexts.free_work);
+
+   mutex_lock(&i915->drm.struct_mutex);
+   contexts_f

[Intel-gfx] [PATCH 07/24] drm/i915: Split vma exec_link/evict_link

2017-05-18 Thread Chris Wilson
Currently the vma has one link member that is used for both holding its
place in the execbuf reservation list, and in any eviction list. This
dual property is quite tricky and error prone.

Signed-off-by: Chris Wilson 
Reviewed-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/i915_gem_evict.c  | 14 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 32 +++---
 drivers/gpu/drm/i915/i915_vma.h|  7 +--
 3 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c 
b/drivers/gpu/drm/i915/i915_gem_evict.c
index 891247d79299..204a2d9288ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -62,7 +62,7 @@ mark_free(struct drm_mm_scan *scan,
if (flags & PIN_NONFAULT && !list_empty(&vma->obj->userfault_link))
return false;
 
-   list_add(&vma->exec_list, unwind);
+   list_add(&vma->evict_link, unwind);
return drm_mm_scan_add_block(scan, &vma->node);
 }
 
@@ -154,7 +154,7 @@ i915_gem_evict_something(struct i915_address_space *vm,
} while (*++phase);
 
/* Nothing found, clean up and bail out! */
-   list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+   list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
ret = drm_mm_scan_remove_block(&scan, &vma->node);
BUG_ON(ret);
}
@@ -200,16 +200,16 @@ i915_gem_evict_something(struct i915_address_space *vm,
 * calling unbind (which may remove the active reference
 * of any of our objects, thus corrupting the list).
 */
-   list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+   list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
if (drm_mm_scan_remove_block(&scan, &vma->node))
__i915_vma_pin(vma);
else
-   list_del(&vma->exec_list);
+   list_del(&vma->evict_link);
}
 
/* Unbinding will emit any required flushes */
ret = 0;
-   list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+   list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
__i915_vma_unpin(vma);
if (ret == 0)
ret = i915_vma_unbind(vma);
@@ -322,10 +322,10 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
 * reference) another in our eviction list.
 */
__i915_vma_pin(vma);
-   list_add(&vma->exec_list, &eviction_list);
+   list_add(&vma->evict_link, &eviction_list);
}
 
-   list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+   list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
__i915_vma_unpin(vma);
if (ret == 0)
ret = i915_vma_unbind(vma);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index fee86e62c934..ba1bf8a0123d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -137,7 +137,7 @@ eb_reset(struct i915_execbuffer *eb)
 {
struct i915_vma *vma;
 
-   list_for_each_entry(vma, &eb->vmas, exec_list) {
+   list_for_each_entry(vma, &eb->vmas, exec_link) {
eb_unreserve_vma(vma);
i915_vma_put(vma);
vma->exec_entry = NULL;
@@ -150,7 +150,7 @@ eb_reset(struct i915_execbuffer *eb)
 static struct i915_vma *
 eb_get_batch(struct i915_execbuffer *eb)
 {
-   struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), 
exec_list);
+   struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), 
exec_link);
 
/*
 * SNA is doing fancy tricks with compressing batch buffers, which leads
@@ -227,7 +227,7 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
}
 
/* Transfer ownership from the objects list to the vmas list. */
-   list_add_tail(&vma->exec_list, &eb->vmas);
+   list_add_tail(&vma->exec_link, &eb->vmas);
list_del_init(&obj->obj_exec_link);
 
vma->exec_entry = &eb->exec[i];
@@ -286,7 +286,7 @@ static void eb_destroy(struct i915_execbuffer *eb)
 {
struct i915_vma *vma;
 
-   list_for_each_entry(vma, &eb->vmas, exec_list) {
+   list_for_each_entry(vma, &eb->vmas, exec_link) {
if (!vma->exec_entry)
continue;
 
@@ -752,7 +752,7 @@ static int eb_relocate(struct i915_execbuffer *eb)
struct i915_vma *vma;
int ret = 0;
 
-   list_for_each_entry(vma, &eb->vmas, exec_list) {
+   list_for_each_entry(vma, &eb->vmas, exec_link) {
ret = eb_relocate_vma(vma, eb);
if (ret)
break;
@@ -905,7 +905,7 @@ static int eb_rese

[Intel-gfx] [PATCH 24/24] RFC drm/i915: Expose a PMU interface for perf queries

2017-05-18 Thread Chris Wilson
The first goal is to be able to measure GPU (and invidual ring) busyness
without having to poll registers from userspace. (Which not only incurs
holding the forcewake lock indefinitely, perturbing the system, but also
runs the risk of hanging the machine.) As an alternative we can use the
perf event counter interface to sample the ring registers periodically
and send those results to userspace.

To be able to do so, we need to export the two symbols from
kernel/events/core.c to register and unregister a PMU device.

v2: Use a common timer for the ring sampling.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/Makefile   |   1 +
 drivers/gpu/drm/i915/i915_drv.c |   2 +
 drivers/gpu/drm/i915/i915_drv.h |  23 ++
 drivers/gpu/drm/i915/i915_pmu.c | 449 
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 include/uapi/drm/i915_drm.h |  40 +++
 kernel/events/core.c|   1 +
 7 files changed, 518 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_pmu.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 7b05fb802f4c..ca88e6e67910 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -26,6 +26,7 @@ i915-y := i915_drv.o \
 
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
+i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
 
 # GEM code
 i915-y += i915_cmd_parser.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 2d2fb4327f97..e3c6d052d1c9 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1144,6 +1144,7 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
struct drm_device *dev = &dev_priv->drm;
 
i915_gem_shrinker_init(dev_priv);
+   i915_pmu_register(dev_priv);
 
/*
 * Notify a valid surface after modesetting,
@@ -1197,6 +1198,7 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
intel_opregion_unregister(dev_priv);
 
i915_perf_unregister(dev_priv);
+   i915_pmu_unregister(dev_priv);
 
i915_teardown_sysfs(dev_priv);
i915_guc_log_unregister(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1fa1e7d48f02..10beae1a13c8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2075,6 +2076,12 @@ struct intel_cdclk_state {
unsigned int cdclk, vco, ref;
 };
 
+enum {
+   __I915_SAMPLE_FREQ_ACT = 0,
+   __I915_SAMPLE_FREQ_REQ,
+   __I915_NUM_PMU_SAMPLERS
+};
+
 struct drm_i915_private {
struct drm_device drm;
 
@@ -2564,6 +2571,13 @@ struct drm_i915_private {
int irq;
} lpe_audio;
 
+   struct {
+   struct pmu base;
+   struct hrtimer timer;
+   u64 enable;
+   u64 sample[__I915_NUM_PMU_SAMPLERS];
+   } pmu;
+
/*
 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 * will be rejected. Instead look for a better place.
@@ -3681,6 +3695,15 @@ extern void i915_perf_fini(struct drm_i915_private 
*dev_priv);
 extern void i915_perf_register(struct drm_i915_private *dev_priv);
 extern void i915_perf_unregister(struct drm_i915_private *dev_priv);
 
+/* i915_pmu.c */
+#ifdef CONFIG_PERF_EVENTS
+extern void i915_pmu_register(struct drm_i915_private *i915);
+extern void i915_pmu_unregister(struct drm_i915_private *i915);
+#else
+static inline void i915_pmu_register(struct drm_i915_private *i915) {}
+static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
+#endif
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_i915_private *dev_priv);
 extern int i915_restore_state(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
new file mode 100644
index ..80e1c07841ac
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -0,0 +1,449 @@
+#include 
+#include 
+
+#include "i915_drv.h"
+#include "intel_ringbuffer.h"
+
+#define FREQUENCY 200
+#define PERIOD max_t(u64, 1, NSEC_PER_SEC / FREQUENCY)
+
+#define RING_MASK 0x
+#define RING_MAX 32
+
+static void engines_sample(struct drm_i915_private *dev_priv)
+{
+   struct intel_engine_cs *engine;
+   enum intel_engine_id id;
+   bool fw = false;
+
+   if ((dev_priv->pmu.enable & RING_MASK) == 0)
+   return;
+
+   if (!dev_priv->gt.awake)
+   return;
+
+   if (!intel_runtime_pm_get_if_in_use(dev_priv))
+   return;
+
+   for_each_engine(engine, dev_priv, id) {
+   u32 val;
+
+   if ((dev_priv->pmu.enable & (0x7 << (4*id))) == 0)
+   continue;
+
+   if (i915_seqno

[Intel-gfx] [PATCH 23/24] drm/i915: Keep a recent cache of freed contexts objects for reuse

2017-05-18 Thread Chris Wilson
Keep the recently freed context objects for reuse. This allows us to use
the current GGTT bindings and dma bound pages, avoiding any clflushes as
required. We mark the objects as purgeable under memory pressure, and
reap the list of freed objects as soon as the device is idle.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 +
 drivers/gpu/drm/i915/i915_gem.c  |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c  | 59 ++--
 drivers/gpu/drm/i915/i915_gem_context.h  |  5 ++
 drivers/gpu/drm/i915/intel_lrc.c |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |  2 +-
 drivers/gpu/drm/i915/selftests/mock_context.c|  1 +
 drivers/gpu/drm/i915/selftests/mock_gem_device.c |  1 +
 8 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1d55bbde68df..1fa1e7d48f02 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2316,6 +2316,8 @@ struct drm_i915_private {
struct llist_head free_list;
struct work_struct free_work;
 
+   struct list_head freed_objects;
+
/* The hw wants to have a stable context identifier for the
 * lifetime of the context (for OA, PASID, faults, etc).
 * This is limited in execlists to 21 bits.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c044b5dbdb66..f482c320d810 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3212,6 +3212,7 @@ i915_gem_idle_work_handler(struct work_struct *work)
DRM_ERROR("Timeout waiting for engines to idle\n");
 
intel_engines_mark_idle(dev_priv);
+   i915_gem_contexts_mark_idle(dev_priv);
i915_gem_timelines_mark_idle(dev_priv);
 
GEM_BUG_ON(!dev_priv->gt.awake);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 198b0064a3d3..1e51fd36f355 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -96,6 +96,48 @@
 /* Initial size (as log2) to preallocate the handle->object hashtable */
 #define VMA_HT_BITS 2u /* 4 x 2 pointers, 64 bytes minimum */
 
+struct drm_i915_gem_object *
+i915_gem_context_create_object(struct drm_i915_private *i915,
+  unsigned long size)
+{
+   struct drm_i915_gem_object *obj, *on;
+
+   lockdep_assert_held(&i915->drm.struct_mutex);
+
+   list_for_each_entry_safe(obj, on,
+&i915->contexts.freed_objects,
+batch_pool_link) {
+   if (obj->mm.madv != I915_MADV_DONTNEED) {
+   /* Purge the heretic! */
+   list_del(&obj->batch_pool_link);
+   i915_gem_object_put(obj);
+   continue;
+   }
+
+   if (obj->base.size == size) {
+   list_del(&obj->batch_pool_link);
+   obj->mm.madv = I915_MADV_WILLNEED;
+   return obj;
+   }
+   }
+
+   return i915_gem_object_create(i915, size);
+}
+
+void i915_gem_contexts_mark_idle(struct drm_i915_private *i915)
+{
+   struct drm_i915_gem_object *obj, *on;
+
+   lockdep_assert_held(&i915->drm.struct_mutex);
+
+   list_for_each_entry_safe(obj, on,
+&i915->contexts.freed_objects,
+batch_pool_link) {
+   list_del(&obj->batch_pool_link);
+   i915_gem_object_put(obj);
+   }
+}
+
 static void resize_vma_ht(struct work_struct *work)
 {
struct i915_gem_context_vma_lut *lut =
@@ -160,9 +202,10 @@ static void vma_lut_free(struct i915_gem_context *ctx)
 
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
+   struct drm_i915_private *i915 = ctx->i915;
int i;
 
-   lockdep_assert_held(&ctx->i915->drm.struct_mutex);
+   lockdep_assert_held(&i915->drm.struct_mutex);
GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
vma_lut_free(ctx);
@@ -178,7 +221,11 @@ static void i915_gem_context_free(struct i915_gem_context 
*ctx)
if (ce->ring)
intel_ring_free(ce->ring);
 
-   __i915_gem_object_release_unless_active(ce->state->obj);
+   /* Keep the objects around for quick reuse */
+   GEM_BUG_ON(ce->state->obj->mm.madv != I915_MADV_WILLNEED);
+   ce->state->obj->mm.madv = I915_MADV_DONTNEED;
+   list_add(&ce->state->obj->batch_pool_link,
+&i915->contexts.freed_objects);
}
 
kfree(ctx->name);
@@ -186,7 +233,7 @@ static void i915_gem_context_free(struct i915_gem_context 
*ctx)
 
list_del(&ctx->link);
 
-   ida_simple_remove(&

[Intel-gfx] [PATCH 19/24] drm/i915/scheduler: Support user-defined priorities

2017-05-18 Thread Chris Wilson
Use a priority stored in the context as the initial value when
submitting a request. This allows us to change the default priority on a
per-context basis, allowing different contexts to be favoured with GPU
time at the expense of lower importance work. The user can adjust the
context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
values being higher priority (they will be serviced earlier, after their
dependencies have been resolved). Any prerequisite work for an execbuf
will have its priority raised to match the new request as required.

Normal users can specify any value in the range of -1023 to 0 [default],
i.e. they can reduce the priority of their workloads (and temporarily
boost it back to normal if so desired).

Privileged users can specify any value in the range of -1023 to 1023,
[default is 0], i.e. they can raise their priority above all overs and
so potentially starve the system.

Note that the existing schedulers are not fair, nor load balancing, the
execution is strictly by priority on a first-come, first-served basis,
and the driver may choose to boost some requests above the range
available to users.

This priority was originally based around nice(2), but evolved to allow
clients to adjust their priority within a small range, and allow for a
privileged high priority range.

For example, this can be used to implement EGL_IMG_context_priority
https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt

EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
the context to be created. This attribute is a hint, as an
implementation may not support multiple contexts at some
priority levels and system policy may limit access to high
priority contexts to appropriate system privilege level. The
default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
EGL_CONTEXT_PRIORITY_MEDIUM_IMG."

so we can map

PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
PRIORITY_MED -> 0 [default]
PRIORITY_LOW -> -1023

They also map onto the priorities used by VkQueue (and a VkQueue is
essentially a timeline, our i915_gem_context under full-ppgtt).

v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/
v3: Report min/max user priorities as defines in the uapi, and rebase
internal priorities on the exposed values.

Testcase: igt/gem_exec_schedule
Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem_context.c | 23 +++
 drivers/gpu/drm/i915/i915_gem_request.h | 11 ---
 include/uapi/drm/i915_drm.h |  9 -
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index c62e17a68f5b..0735d624ef07 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1038,6 +1038,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
case I915_CONTEXT_PARAM_BANNABLE:
args->value = i915_gem_context_is_bannable(ctx);
break;
+   case I915_CONTEXT_PARAM_PRIORITY:
+   args->value = ctx->priority;
+   break;
default:
ret = -EINVAL;
break;
@@ -1095,6 +1098,26 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
*dev, void *data,
else
i915_gem_context_clear_bannable(ctx);
break;
+
+   case I915_CONTEXT_PARAM_PRIORITY:
+   {
+   int priority = args->value;
+
+   if (args->size)
+   ret = -EINVAL;
+   else if (!to_i915(dev)->engine[RCS]->schedule)
+   ret = -ENODEV;
+   else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
+priority < I915_CONTEXT_MIN_USER_PRIORITY)
+   ret = -EINVAL;
+   else if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
+!capable(CAP_SYS_NICE))
+   ret = -EPERM;
+   else
+   ctx->priority = priority;
+   }
+   break;
+
default:
ret = -EINVAL;
break;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h 
b/drivers/gpu/drm/i915/i915_gem_request.h
index 7b7c84369d78..d34d8adba9e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -30,6 +30,8 @@
 #include "i915_gem.h"
 #include "i915_sw_fence.h"
 
+#include 
+
 struct drm_file;
 struct drm_i915_gem_object;
 struct drm_i915_gem_request;
@@ -69,9 +71,12 @@ struct i915_priotree {
struct list_head waiters_list; /* those after us, they depend upon us */
struct list_head link;
int priority;
-#define I915_PRIORI

[Intel-gfx] [PATCH 20/24] drm/i915: Group all the global context information together

2017-05-18 Thread Chris Wilson
Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  4 +--
 drivers/gpu/drm/i915/i915_drv.c  |  9 +++---
 drivers/gpu/drm/i915/i915_drv.h  | 20 --
 drivers/gpu/drm/i915/i915_gem.c  | 13 -
 drivers/gpu/drm/i915/i915_gem_context.c  | 35 +---
 drivers/gpu/drm/i915/i915_gem_context.h  | 14 ++
 drivers/gpu/drm/i915/i915_sysfs.c|  2 +-
 drivers/gpu/drm/i915/intel_lrc.c |  2 +-
 drivers/gpu/drm/i915/selftests/mock_context.c|  2 +-
 drivers/gpu/drm/i915/selftests/mock_gem_device.c |  4 +--
 10 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index d72442ce4806..b6d0a6569aae 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1956,7 +1956,7 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
if (ret)
return ret;
 
-   list_for_each_entry(ctx, &dev_priv->context_list, link) {
+   list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
seq_printf(m, "HW context %u ", ctx->hw_id);
if (ctx->pid) {
struct task_struct *task;
@@ -2062,7 +2062,7 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
if (ret)
return ret;
 
-   list_for_each_entry(ctx, &dev_priv->context_list, link)
+   list_for_each_entry(ctx, &dev_priv->contexts.list, link)
for_each_engine(engine, dev_priv, id)
i915_dump_lrc_obj(m, ctx, engine);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 178550a93884..6c91bc9f7e2b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -553,13 +553,13 @@ static void i915_gem_fini(struct drm_i915_private 
*dev_priv)
mutex_lock(&dev_priv->drm.struct_mutex);
intel_uc_fini_hw(dev_priv);
i915_gem_cleanup_engines(dev_priv);
-   i915_gem_context_fini(dev_priv);
+   i915_gem_contexts_fini(dev_priv);
i915_gem_cleanup_userptr(dev_priv);
mutex_unlock(&dev_priv->drm.struct_mutex);
 
i915_gem_drain_freed_objects(dev_priv);
 
-   WARN_ON(!list_empty(&dev_priv->context_list));
+   WARN_ON(!list_empty(&dev_priv->contexts.list));
 }
 
 static int i915_load_modeset_init(struct drm_device *dev)
@@ -1392,9 +1392,10 @@ static void i915_driver_release(struct drm_device *dev)
 
 static int i915_driver_open(struct drm_device *dev, struct drm_file *file)
 {
+   struct drm_i915_private *i915 = to_i915(dev);
int ret;
 
-   ret = i915_gem_open(dev, file);
+   ret = i915_gem_open(i915, file);
if (ret)
return ret;
 
@@ -1424,7 +1425,7 @@ static void i915_driver_postclose(struct drm_device *dev, 
struct drm_file *file)
struct drm_i915_file_private *file_priv = file->driver_priv;
 
mutex_lock(&dev->struct_mutex);
-   i915_gem_context_close(dev, file);
+   i915_gem_context_close(file);
i915_gem_release(dev, file);
mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 261de5f38956..61a0c81164cc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2233,13 +2233,6 @@ struct drm_i915_private {
DECLARE_HASHTABLE(mm_structs, 7);
struct mutex mm_lock;
 
-   /* The hw wants to have a stable context identifier for the lifetime
-* of the context (for OA, PASID, faults, etc). This is limited
-* in execlists to 21 bits.
-*/
-   struct ida context_hw_ida;
-#define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
-
/* Kernel Modesetting */
 
struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
@@ -2318,7 +2311,16 @@ struct drm_i915_private {
 */
struct mutex av_mutex;
 
-   struct list_head context_list;
+   struct {
+   struct list_head list;
+
+   /* The hw wants to have a stable context identifier for the
+* lifetime of the context (for OA, PASID, faults, etc).
+* This is limited in execlists to 21 bits.
+*/
+   struct ida hw_ida;
+#define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
+   } contexts;
 
u32 fdi_rx_config;
 
@@ -3450,7 +3452,7 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
 void i915_gem_object_unpin_from_display_plane(struct i915_vma *vma);
 int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
int align);
-int i915_gem_open(struct drm_device *dev, struct drm_file *file);
+int i915_gem_open(struct drm_i915_private *i915,

[Intel-gfx] [PATCH 22/24] drm/i915: Enable rcu-only context lookups

2017-05-18 Thread Chris Wilson
Whilst the contents of the context is still protected by the big
struct_mutex, this is not much of an improvement. It is just one tiny
step towards reducing our BKL.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h| 16 +--
 drivers/gpu/drm/i915/i915_gem_context.c| 77 ++
 drivers/gpu/drm/i915/i915_gem_context.h|  5 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 21 
 4 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f11c06bd977e..1d55bbde68df 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3486,15 +3486,21 @@ void i915_gem_object_save_bit_17_swizzle(struct 
drm_i915_gem_object *obj,
 struct sg_table *pages);
 
 static inline struct i915_gem_context *
+__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id)
+{
+   return idr_find(&file_priv->context_idr, id);
+}
+
+static inline struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
 {
struct i915_gem_context *ctx;
 
-   lockdep_assert_held(&file_priv->dev_priv->drm.struct_mutex);
-
-   ctx = idr_find(&file_priv->context_idr, id);
-   if (!ctx)
-   return ERR_PTR(-ENOENT);
+   rcu_read_lock();
+   ctx = __i915_gem_context_lookup_rcu(file_priv, id);
+   if (ctx && !kref_get_unless_zero(&ctx->ref))
+   ctx = NULL;
+   rcu_read_unlock();
 
return ctx;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 2ad78bb50137..198b0064a3d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -187,7 +187,7 @@ static void i915_gem_context_free(struct i915_gem_context 
*ctx)
list_del(&ctx->link);
 
ida_simple_remove(&ctx->i915->contexts.hw_ida, ctx->hw_id);
-   kfree(ctx);
+   kfree_rcu(ctx, rcu);
 }
 
 static void contexts_free(struct drm_i915_private *i915)
@@ -1021,20 +1021,19 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
*dev, void *data,
if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
return -ENOENT;
 
-   ret = i915_mutex_lock_interruptible(dev);
-   if (ret)
-   return ret;
-
ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-   if (IS_ERR(ctx)) {
-   mutex_unlock(&dev->struct_mutex);
-   return PTR_ERR(ctx);
-   }
+   if (!ctx)
+   return -ENOENT;
+
+   ret = mutex_lock_interruptible(&dev->struct_mutex);
+   if (ret)
+   goto out;
 
__destroy_hw_context(ctx, file_priv);
mutex_unlock(&dev->struct_mutex);
 
-   DRM_DEBUG("HW context %d destroyed\n", args->ctx_id);
+out:
+   i915_gem_context_put(ctx);
return 0;
 }
 
@@ -1044,17 +1043,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
struct drm_i915_file_private *file_priv = file->driver_priv;
struct drm_i915_gem_context_param *args = data;
struct i915_gem_context *ctx;
-   int ret;
-
-   ret = i915_mutex_lock_interruptible(dev);
-   if (ret)
-   return ret;
+   int ret = 0;
 
ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-   if (IS_ERR(ctx)) {
-   mutex_unlock(&dev->struct_mutex);
-   return PTR_ERR(ctx);
-   }
+   if (!ctx)
+   return -ENOENT;
 
args->size = 0;
switch (args->param) {
@@ -1085,8 +1078,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
ret = -EINVAL;
break;
}
-   mutex_unlock(&dev->struct_mutex);
 
+   i915_gem_context_put(ctx);
return ret;
 }
 
@@ -1098,15 +1091,13 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
*dev, void *data,
struct i915_gem_context *ctx;
int ret;
 
+   ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+   if (!ctx)
+   return -ENOENT;
+
ret = i915_mutex_lock_interruptible(dev);
if (ret)
-   return ret;
-
-   ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-   if (IS_ERR(ctx)) {
-   mutex_unlock(&dev->struct_mutex);
-   return PTR_ERR(ctx);
-   }
+   goto out;
 
switch (args->param) {
case I915_CONTEXT_PARAM_BAN_PERIOD:
@@ -1164,6 +1155,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
*dev, void *data,
}
mutex_unlock(&dev->struct_mutex);
 
+out:
+   i915_gem_context_put(ctx);
return ret;
 }
 
@@ -1181,27 +1174,31 @@ int i915_gem_context_reset_stats_ioctl(struct 
drm_device *dev,
if (args->ctx_id == DEFAULT_CONTEXT_HANDLE && !capable(CAP_SYS_ADMIN))
return -EPERM;
 
- 

[Intel-gfx] [PATCH 09/24] drm/i915: Pass vma to relocate entry

2017-05-18 Thread Chris Wilson
We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.

Note we have only been using a single bit to track flushing since

commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
Author: Daniel Vetter 
Date:   Wed Jun 13 20:45:19 2012 +0200

drm/i915: disable flushing_list/gpu_write_list

unconditionally flushed all render caches before the breadcrumb and

commit 6ac42f4148bc27e5ffd18a9ab0eac57f58822af4
Author: Daniel Vetter 
Date:   Sat Jul 21 12:25:01 2012 +0200

drm/i915: Replace the complex flushing logic with simple invalidate/flush 
all

did away with the explicit GPU domain tracking. This was then codified
into the ABI with NO_RELOC in

commit ed5982e6ce5f106abcbf071f80730db344a6da42
Author: Daniel Vetter  # Oi! Patch stealer!
Date:   Thu Jan 17 22:23:36 2013 +0100

drm/i915: Allow userspace to hint that the relocations were known

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 102 -
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b21cec236b98..039ec1fb3e03 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -623,42 +623,25 @@ relocate_entry(struct drm_i915_gem_object *obj,
 }
 
 static int
-eb_relocate_entry(struct drm_i915_gem_object *obj,
+eb_relocate_entry(struct i915_vma *vma,
  struct i915_execbuffer *eb,
  struct drm_i915_gem_relocation_entry *reloc)
 {
-   struct drm_gem_object *target_obj;
-   struct drm_i915_gem_object *target_i915_obj;
-   struct i915_vma *target_vma;
-   uint64_t target_offset;
+   struct i915_vma *target;
+   u64 target_offset;
int ret;
 
/* we've already hold a reference to all valid objects */
-   target_vma = eb_get_vma(eb, reloc->target_handle);
-   if (unlikely(target_vma == NULL))
+   target = eb_get_vma(eb, reloc->target_handle);
+   if (unlikely(!target))
return -ENOENT;
-   target_i915_obj = target_vma->obj;
-   target_obj = &target_vma->obj->base;
-
-   target_offset = gen8_canonical_addr(target_vma->node.start);
-
-   /* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
-* pipe_control writes because the gpu doesn't properly redirect them
-* through the ppgtt for non_secure batchbuffers. */
-   if (unlikely(IS_GEN6(eb->i915) &&
-reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
-   ret = i915_vma_bind(target_vma, target_i915_obj->cache_level,
-   PIN_GLOBAL);
-   if (WARN_ONCE(ret, "Unexpected failure to bind target VMA!"))
-   return ret;
-   }
 
/* Validate that the target is in a valid r/w GPU domain */
if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
DRM_DEBUG("reloc with multiple write domains: "
- "obj %p target %d offset %d "
+ "target %d offset %d "
  "read %08x write %08x",
- obj, reloc->target_handle,
+ reloc->target_handle,
  (int) reloc->offset,
  reloc->read_domains,
  reloc->write_domain);
@@ -667,43 +650,57 @@ eb_relocate_entry(struct drm_i915_gem_object *obj,
if (unlikely((reloc->write_domain | reloc->read_domains)
 & ~I915_GEM_GPU_DOMAINS)) {
DRM_DEBUG("reloc with read/write non-GPU domains: "
- "obj %p target %d offset %d "
+ "target %d offset %d "
  "read %08x write %08x",
- obj, reloc->target_handle,
+ reloc->target_handle,
  (int) reloc->offset,
  reloc->read_domains,
  reloc->write_domain);
return -EINVAL;
}
 
-   target_obj->pending_read_domains |= reloc->read_domains;
-   target_obj->pending_write_domain |= reloc->write_domain;
+   if (reloc->write_domain)
+   target->exec_entry->flags |= EXEC_OBJECT_WRITE;
+
+   /*
+* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
+* pipe_control writes because the gpu doesn't properly redirect them
+* through the ppgtt for non_secure batchbuffers.
+*/
+   if (unlikely(IS_GEN6(eb->i915) &&
+reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION)) {
+   ret = i915_vma_bind(target, target->obj->cache_level,
+   PIN_GLOBAL);

[Intel-gfx] [PATCH 05/24] drm/i915: Amalgamate execbuffer parameter structures

2017-05-18 Thread Chris Wilson
Combine the two slightly overlapping parameter structures we pass around
the execbuffer routines into one.

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 550 -
 1 file changed, 233 insertions(+), 317 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 2e5f513087a8..52021bc64754 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -50,70 +50,78 @@
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
-struct i915_execbuffer_params {
-   struct drm_device   *dev;
-   struct drm_file *file;
-   struct i915_vma *batch;
-   u32 dispatch_flags;
-   u32 args_batch_start_offset;
-   struct intel_engine_cs  *engine;
-   struct i915_gem_context *ctx;
-   struct drm_i915_gem_request *request;
-};
+#define __I915_EXEC_ILLEGAL_FLAGS \
+   (__I915_EXEC_UNKNOWN_FLAGS | I915_EXEC_CONSTANTS_MASK)
 
-struct eb_vmas {
+struct i915_execbuffer {
struct drm_i915_private *i915;
+   struct drm_file *file;
+   struct drm_i915_gem_execbuffer2 *args;
+   struct drm_i915_gem_exec_object2 *exec;
+   struct intel_engine_cs *engine;
+   struct i915_gem_context *ctx;
+   struct i915_address_space *vm;
+   struct i915_vma *batch;
+   struct drm_i915_gem_request *request;
+   u32 batch_start_offset;
+   u32 batch_len;
+   unsigned int dispatch_flags;
+   struct drm_i915_gem_exec_object2 shadow_exec_entry;
+   bool need_relocs;
struct list_head vmas;
+   struct reloc_cache {
+   struct drm_mm_node node;
+   unsigned long vaddr;
+   unsigned int page;
+   bool use_64bit_reloc : 1;
+   } reloc_cache;
int and;
union {
-   struct i915_vma *lut[0];
-   struct hlist_head buckets[0];
+   struct i915_vma **lut;
+   struct hlist_head *buckets;
};
 };
 
-static struct eb_vmas *
-eb_create(struct drm_i915_private *i915,
- struct drm_i915_gem_execbuffer2 *args)
+static int
+eb_create(struct i915_execbuffer *eb)
 {
-   struct eb_vmas *eb = NULL;
-
-   if (args->flags & I915_EXEC_HANDLE_LUT) {
-   unsigned size = args->buffer_count;
+   eb->lut = NULL;
+   if (eb->args->flags & I915_EXEC_HANDLE_LUT) {
+   unsigned int size = eb->args->buffer_count;
size *= sizeof(struct i915_vma *);
-   size += sizeof(struct eb_vmas);
-   eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | 
__GFP_NORETRY);
+   eb->lut = kmalloc(size,
+ GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
}
 
-   if (eb == NULL) {
-   unsigned size = args->buffer_count;
-   unsigned count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
+   if (!eb->lut) {
+   unsigned int size = eb->args->buffer_count;
+   unsigned int count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
BUILD_BUG_ON_NOT_POWER_OF_2(PAGE_SIZE / sizeof(struct 
hlist_head));
while (count > 2*size)
count >>= 1;
-   eb = kzalloc(count*sizeof(struct hlist_head) +
-sizeof(struct eb_vmas),
-GFP_TEMPORARY);
-   if (eb == NULL)
-   return eb;
+   eb->lut = kzalloc(count * sizeof(struct hlist_head),
+ GFP_TEMPORARY);
+   if (!eb->lut)
+   return -ENOMEM;
 
eb->and = count - 1;
-   } else
-   eb->and = -args->buffer_count;
+   } else {
+   eb->and = -eb->args->buffer_count;
+   }
 
-   eb->i915 = i915;
INIT_LIST_HEAD(&eb->vmas);
-   return eb;
+   return 0;
 }
 
 static void
-eb_reset(struct eb_vmas *eb)
+eb_reset(struct i915_execbuffer *eb)
 {
if (eb->and >= 0)
memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static struct i915_vma *
-eb_get_batch(struct eb_vmas *eb)
+eb_get_batch(struct i915_execbuffer *eb)
 {
struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), 
exec_list);
 
@@ -133,34 +141,30 @@ eb_get_batch(struct eb_vmas *eb)
 }
 
 static int
-eb_lookup_vmas(struct eb_vmas *eb,
-  struct drm_i915_gem_exec_object2 *exec,
-  const struct drm_i915_gem_execbuffer2 *args,
-  struct i915_address_space *vm,
-  struct drm_file *file)
+eb_lookup_vmas(struct i915_execbuffer *eb)
 {
struct drm_i915_gem_object *obj;
struct list_head objects;
int i, ret;
 
INIT_LIST_HEAD(&objec

[Intel-gfx] [PATCH 04/24] drm/i915: Reinstate reservation_object zapping for batch_pool objects

2017-05-18 Thread Chris Wilson
I removed the zapping of the reservation_object->fence array of shared
fences prematurely. We don't yet have the code to zap that array when
retiring the object, and so currently it remains possible to continually
grow the shared array trapping requests when reusing the batch_pool
object across many timelines.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: Joonas Lahtinen 
Cc: Mika Kuoppala 
Cc: Matthew Auld 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c 
b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 41aa598c4f3b..c93005c2e0fb 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -114,12 +114,27 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
list_for_each_entry(obj, list, batch_pool_link) {
/* The batches are strictly LRU ordered */
if (i915_gem_object_is_active(obj)) {
-   if (!reservation_object_test_signaled_rcu(obj->resv,
- true))
+   struct reservation_object *resv = obj->resv;
+
+   if (!reservation_object_test_signaled_rcu(resv, true))
break;
 
i915_gem_retire_requests(pool->engine->i915);
GEM_BUG_ON(i915_gem_object_is_active(obj));
+
+   /*
+* The object is now idle, clear the array of shared
+* fences before we add a new request. Although, we
+* remain on the same engine, we may be on a different
+* timeline and so may continually grow the array,
+* trapping a reference to all the old fences, rather
+* than replace the existing fence.
+*/
+   if (rcu_access_pointer(resv->fence)) {
+   reservation_object_lock(resv, NULL);
+   reservation_object_add_excl_fence(resv, NULL);
+   reservation_object_unlock(resv);
+   }
}
 
GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv,
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/24] drm/i915/selftests: Pretend to be a gfx pci device

2017-05-18 Thread Chris Wilson
Set the class on our mock pci device to GFX. This should be useful for
utilities like intel-iommu that special case gfx devices.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101080
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/selftests/mock_gem_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index f4edd4c6cb07..627e2aa09766 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -121,6 +121,7 @@ struct drm_i915_private *mock_gem_device(void)
goto err;
 
device_initialize(&pdev->dev);
+   pdev->class = PCI_BASE_CLASS_DISPLAY << 16;
pdev->dev.release = release_dev;
dev_set_name(&pdev->dev, "mock");
dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 08/24] drm/i915: Store a direct lookup from object handle to vma

2017-05-18 Thread Chris Wilson
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_debugfs.c   |   6 +
 drivers/gpu/drm/i915/i915_drv.h   |   2 +-
 drivers/gpu/drm/i915/i915_gem.c   |   5 +-
 drivers/gpu/drm/i915/i915_gem_context.c   |  82 +++-
 drivers/gpu/drm/i915/i915_gem_context.h   |  26 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c| 264 --
 drivers/gpu/drm/i915/i915_gem_object.h|   4 +-
 drivers/gpu/drm/i915/i915_utils.h |   5 +
 drivers/gpu/drm/i915/i915_vma.c   |  20 ++
 drivers/gpu/drm/i915/i915_vma.h   |   8 +-
 drivers/gpu/drm/i915/selftests/mock_context.c |  12 +-
 11 files changed, 322 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 8abb93994c48..d72442ce4806 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1988,6 +1988,12 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
seq_putc(m, '\n');
}
 
+   seq_printf(m,
+  "\tvma hashtable size=%u (actual %lu), count=%u\n",
+  ctx->vma_lut.ht_size,
+  BIT(ctx->vma_lut.ht_bits),
+  ctx->vma_lut.ht_count);
+
seq_putc(m, '\n');
}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 17883a84b8c1..4f244f9c3d69 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,7 +37,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 870659c13de3..4d01089a38f9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3242,6 +3242,10 @@ void i915_gem_close_object(struct drm_gem_object *gem, 
struct drm_file *file)
if (vma->vm->file == fpriv)
i915_vma_close(vma);
 
+   vma = obj->vma_hashed;
+   if (vma && vma->ctx->file_priv == fpriv)
+   i915_vma_unlink_ctx(vma);
+
if (i915_gem_object_is_active(obj) &&
!i915_gem_object_has_active_reference(obj)) {
i915_gem_object_set_active_reference(obj);
@@ -4231,7 +4235,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 
INIT_LIST_HEAD(&obj->global_link);
INIT_LIST_HEAD(&obj->userfault_link);
-   INIT_LIST_HEAD(&obj->obj_exec_link);
INIT_LIST_HEAD(&obj->vma_list);
INIT_LIST_HEAD(&obj->batch_pool_link);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index c5d1666d7071..9af443e69c57 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -85,6 +85,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include "i915_drv.h"
@@ -92,6 +93,70 @@
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
 
+/* Initial size (as log2) to preallocate the handle->object hashtable */
+#define VMA_HT_BITS 2u /* 4 x 2 pointers, 64 bytes minimum */
+
+static void resize_vma_ht(struct work_struct *work)
+{
+   struct i915_gem_context_vma_lut *lut =
+   container_of(work, typeof(*lut), resize);
+   unsigned int bits, new_bits, size, i;
+   struct hlist_head *new_ht;
+
+   GEM_BUG_ON(!(lut->ht_size & I915_CTX_RESIZE_IN_PROGRESS));
+
+   bits = 1 + ilog2(4*lut->ht_count/3 + 1);
+   new_bits = min_t(unsigned int,
+max(bits, VMA_HT_BITS),
+sizeof(unsigned int) * BITS_PER_BYTE - 1);
+   if (new_bits == lut->ht_bits)
+   goto out;
+
+   new_ht = kzalloc(sizeof(*new_ht)

[Intel-gfx] [PATCH 13/24] drm/i915: First try the previous execbuffer location

2017-05-18 Thread Chris Wilson
When choosing a slot for an execbuffer, we ideally want to use the same
address as last time (so that we don't have to rebind it) and the same
address as expected by the user (so that we don't have to fixup any
relocations pointing to it). If we first try to bind the incoming
execbuffer->offset from the user, or the currently bound offset that
should hopefully achieve the goal of avoiding the rebind cost and the
relocation penalty. However, if the object is not currently bound there
we don't want to arbitrarily unbind an object in our chosen position and
so choose to rebind/relocate the incoming object instead. After we
report the new position back to the user, on the next pass the
relocations should have settled down.

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 12 
 drivers/gpu/drm/i915/i915_gem_gtt.c|  6 ++
 drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 92616c96a8c6..171a945046c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -337,10 +337,15 @@ eb_pin_vma(struct i915_execbuffer *eb,
 {
u64 flags;
 
-   flags = vma->node.start;
-   flags |= PIN_USER | PIN_NONBLOCK | PIN_OFFSET_FIXED;
+   if (vma->node.size)
+   flags = vma->node.start;
+   else
+   flags = entry->offset & PIN_OFFSET_MASK;
+
+   flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED;
if (unlikely(entry->flags & EXEC_OBJECT_NEEDS_GTT))
flags |= PIN_GLOBAL;
+
if (unlikely(i915_vma_pin(vma, 0, 0, flags)))
return;
 
@@ -471,8 +476,7 @@ eb_add_vma(struct i915_execbuffer *eb,
entry->flags |= eb->context_flags;
 
err = 0;
-   if (vma->node.size)
-   eb_pin_vma(eb, entry, vma);
+   eb_pin_vma(eb, entry, vma);
if (eb_vma_misplaced(entry, vma)) {
eb_unreserve_vma(vma, entry);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a9d78ebcabfe..b6a798e4d6f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3302,6 +3302,9 @@ int i915_gem_gtt_reserve(struct i915_address_space *vm,
if (err != -ENOSPC)
return err;
 
+   if (flags & PIN_NOEVICT)
+   return -ENOSPC;
+
err = i915_gem_evict_for_node(vm, node, flags);
if (err == 0)
err = drm_mm_reserve_node(&vm->mm, node);
@@ -3416,6 +3419,9 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
if (err != -ENOSPC)
return err;
 
+   if (flags & PIN_NOEVICT)
+   return -ENOSPC;
+
/* No free space, pick a slot at random.
 *
 * There is a pathological case here using a GTT shared between
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fb15684c1d83..a528ce1380fd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -588,6 +588,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 #define PIN_MAPPABLE   BIT(1)
 #define PIN_ZONE_4GBIT(2)
 #define PIN_NONFAULT   BIT(3)
+#define PIN_NOEVICTBIT(4)
 
 #define PIN_MBZBIT(5) /* I915_VMA_PIN_OVERFLOW */
 #define PIN_GLOBAL BIT(6) /* I915_VMA_GLOBAL_BIND */
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 10/24] drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations

2017-05-18 Thread Chris Wilson
If we write a relocation into the buffer, we require our own implicit
synchronisation added after the start of the execbuf, outside of the
user's control. As we may end up clflushing, or doing the patch itself
on the GPU, asynchronously we need to look at the implicit serialisation
on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
object.

If the user does trigger a stall for relocations, we make sure the stall
is complete enough so that the batch is not submitted before we complete
those relocations.

Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit 
fencing")
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Jason Ekstrand 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 039ec1fb3e03..32542205060d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -700,6 +700,16 @@ eb_relocate_entry(struct i915_vma *vma,
return -EINVAL;
}
 
+   /*
+* If we write into the object, we need to force the synchronisation
+* barrier, either with an asynchronous clflush or if we executed the
+* patching using the GPU (though that should be serialised by the
+* timeline). To be completely sure, and since we are required to
+* do relocations we are already stalling, disable the user's opt
+* of our synchronisation.
+*/
+   vma->exec_entry->flags &= ~EXEC_OBJECT_ASYNC;
+
ret = relocate_entry(vma->obj, reloc, &eb->reloc_cache, target_offset);
if (ret)
return ret;
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/24] drm/i915: Eliminate lots of iterations over the execobjects array

2017-05-18 Thread Chris Wilson
The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.

Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.

The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.

The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.

v2: Add a Theory of Operation spiel.
v3: Fall back to slow relocations in preparation for flushing userptrs.
v4: Document struct members, factor out eb_validate_vma(), add a few
more comments to explain some magic and hide other magic behind macros.

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_drv.h |2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c   |   92 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c  | 2020 +--
 drivers/gpu/drm/i915/i915_vma.c |2 +-
 drivers/gpu/drm/i915/i915_vma.h |1 +
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c |4 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c   |   16 +-
 7 files changed, 1240 insertions(+), 897 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4f244f9c3d69..8ba120acf080 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3530,7 +3530,7 @@ int __must_check i915_gem_evict_something(struct 
i915_address_space *vm,
 int __must_check i915_gem_evict_for_node(struct i915_address_space *vm,
 struct drm_mm_node *node,
 unsigned int flags);
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
+int i915_gem_evict_vm(struct i915_address_space *vm);
 
 /* belongs in i915_gem_gtt.h */
 static inline void i915_gem_chipset_flush(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c 
b/drivers/gpu/drm/i915/i915_gem_evict.c
index 204a2d9288ae..a193f1b36c67 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -50,6 +50,29 @@ static bool ggtt_is_idle(struct drm_i915_private *dev_priv)
return true;
 }
 
+static int ggtt_flush(struct drm_i915_private *i915)
+{
+   int err;
+
+   /* Not everything in the GGTT is tracked via vma (otherwise we
+* could evict as required with minimal stalling) so we are forced
+* to idle the GPU and explicitly retire outstanding requests in
+* the hopes that we can then remove contexts and the like only
+* bound by their active reference.
+*/
+   err = i915_gem_switch_to_kernel_context(i915);
+   if (err)
+   return err;
+
+   err = i915_gem_wait_for_idle(i915,
+I915_WAIT_INTERRUPTIBLE |
+I915_WAIT_LOCKED);
+   if (err)
+   return err;
+
+   return 0;
+}
+
 static bool
 mark_free(struct drm_mm_scan *scan,
  struct i915_vma *vma,
@@ -175,19 +198,7 @@ i915_gem_evict_something(struct i915_address_space *vm,
return intel_has_pending_fb_unpin(dev_priv) ? -EAGAIN : -ENOSPC;
}
 
-   /* Not everything in the GGTT is tracked via vma (otherwise we
-* could evict as required with minimal stalling) so we are forced
-* to idle the GPU and explicitly retire outstanding requests in
-* the hopes that we can then remove contexts and the like only
-* bound by their active reference.
-*/
-   ret = i915_gem_switch_to_kernel_context(dev_priv);
-   if (ret)
-   return ret;
-
-   ret = i915_gem_wait_for_idle(dev_priv,
-I915_WAIT_INTERRUPTIBLE |
-I915_WAIT_LOCKED);
+   ret = ggtt_flush(dev_priv);
if (ret)
return ret;
 
@@ -337,10 +348,8 @@ int i915_gem_evict_

[Intel-gfx] [PATCH 06/24] drm/i915: Use vma->exec_entry as our double-entry placeholder

2017-05-18 Thread Chris Wilson
This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem_evict.c  | 17 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 77 --
 drivers/gpu/drm/i915/i915_vma.c|  1 -
 3 files changed, 44 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c 
b/drivers/gpu/drm/i915/i915_gem_evict.c
index 51e365f70464..891247d79299 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -59,9 +59,6 @@ mark_free(struct drm_mm_scan *scan,
if (i915_vma_is_pinned(vma))
return false;
 
-   if (WARN_ON(!list_empty(&vma->exec_list)))
-   return false;
-
if (flags & PIN_NONFAULT && !list_empty(&vma->obj->userfault_link))
return false;
 
@@ -160,8 +157,6 @@ i915_gem_evict_something(struct i915_address_space *vm,
list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
ret = drm_mm_scan_remove_block(&scan, &vma->node);
BUG_ON(ret);
-
-   INIT_LIST_HEAD(&vma->exec_list);
}
 
/* Can we unpin some objects such as idle hw contents,
@@ -209,17 +204,12 @@ i915_gem_evict_something(struct i915_address_space *vm,
if (drm_mm_scan_remove_block(&scan, &vma->node))
__i915_vma_pin(vma);
else
-   list_del_init(&vma->exec_list);
+   list_del(&vma->exec_list);
}
 
/* Unbinding will emit any required flushes */
ret = 0;
-   while (!list_empty(&eviction_list)) {
-   vma = list_first_entry(&eviction_list,
-  struct i915_vma,
-  exec_list);
-
-   list_del_init(&vma->exec_list);
+   list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
__i915_vma_unpin(vma);
if (ret == 0)
ret = i915_vma_unbind(vma);
@@ -315,7 +305,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
}
 
/* Overlap of objects in the same batch? */
-   if (i915_vma_is_pinned(vma) || !list_empty(&vma->exec_list)) {
+   if (i915_vma_is_pinned(vma)) {
ret = -ENOSPC;
if (vma->exec_entry &&
vma->exec_entry->flags & EXEC_OBJECT_PINNED)
@@ -336,7 +326,6 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
}
 
list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
-   list_del_init(&vma->exec_list);
__i915_vma_unpin(vma);
if (ret == 0)
ret = i915_vma_unbind(vma);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 52021bc64754..fee86e62c934 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -109,13 +109,40 @@ eb_create(struct i915_execbuffer *eb)
eb->and = -eb->args->buffer_count;
}
 
-   INIT_LIST_HEAD(&eb->vmas);
return 0;
 }
 
+static inline void
+__eb_unreserve_vma(struct i915_vma *vma,
+  const struct drm_i915_gem_exec_object2 *entry)
+{
+   if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
+   i915_vma_unpin_fence(vma);
+
+   if (entry->flags & __EXEC_OBJECT_HAS_PIN)
+   __i915_vma_unpin(vma);
+}
+
+static void
+eb_unreserve_vma(struct i915_vma *vma)
+{
+   struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+
+   __eb_unreserve_vma(vma, entry);
+   entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
+}
+
 static void
 eb_reset(struct i915_execbuffer *eb)
 {
+   struct i915_vma *vma;
+
+   list_for_each_entry(vma, &eb->vmas, exec_list) {
+   eb_unreserve_vma(vma);
+   i915_vma_put(vma);
+   vma->exec_entry = NULL;
+   }
+
if (eb->and >= 0)
memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
@@ -147,6 +174,8 @@ eb_lookup_vmas(struct i915_execbuffer *eb)
struct list_head objects;
int i, ret;
 
+   INIT_LIST_HEAD(&eb->vmas);
+
INIT_LIST_HEAD(&objects);
spin_lock(&eb->file->table_lock);
/* Grab a reference to the object and release the lock so we can lookup
@@ -253,40 +282,23 @@ static struct i915_vma *eb_get_vma(struct i915_execbuffer 
*eb, unsigned long han
}
 }
 
-static void
-eb_unreserve_vma(struct i915_vma *vma)
-{
-   struct drm_i915_gem_exec_object2 *entry;
-
-   if (!drm_mm_node_allocated(&vma->node))
-   return;
-

[Intel-gfx] [PATCH 18/24] drm/i915: Convert execbuf to use struct-of-array packing for critical fields

2017-05-18 Thread Chris Wilson
When userspace is doing most of the work, avoiding relocs (using
NO_RELOC) and opting out of implicit synchronisation (using ASYNC), we
still spend a lot of time processing the arrays in execbuf, even though
we now should have nothing to do most of the time. One issue that
becomes readily apparent in profiling anv is that iterating over the
large execobj[] is unfriendly to the loop prefetchers of the CPU and it
much prefers iterating over a pair of arrays rather than one big array.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem_evict.c  |   4 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 224 +++--
 drivers/gpu/drm/i915/i915_vma.h|   2 +-
 3 files changed, 118 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c 
b/drivers/gpu/drm/i915/i915_gem_evict.c
index a193f1b36c67..4df039ef2ce3 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -318,8 +318,8 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
/* Overlap of objects in the same batch? */
if (i915_vma_is_pinned(vma)) {
ret = -ENOSPC;
-   if (vma->exec_entry &&
-   vma->exec_entry->flags & EXEC_OBJECT_PINNED)
+   if (vma->exec_flags &&
+   *vma->exec_flags & EXEC_OBJECT_PINNED)
ret = -EINVAL;
break;
}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f5a0e419bfec..d6eb38b12976 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -47,13 +47,13 @@ enum {
 #define DBG_FORCE_RELOC 0 /* choose one of the above! */
 };
 
-#define  __EXEC_OBJECT_HAS_REF BIT(31)
-#define  __EXEC_OBJECT_HAS_PIN BIT(30)
-#define  __EXEC_OBJECT_HAS_FENCE   BIT(29)
+#define  __LUT_HAS_REF BIT(31)
+#define  __LUT_HAS_PIN BIT(30)
+#define  __LUT_HAS_FENCE   BIT(29)
 #define  __EXEC_OBJECT_NEEDS_MAP   BIT(28)
 #define  __EXEC_OBJECT_NEEDS_BIAS  BIT(27)
-#define  __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
-#define __EB_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
+#define  __LUT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
+#define __LUT_RESERVED (__LUT_HAS_PIN | __LUT_HAS_FENCE)
 
 #define __EXEC_HAS_RELOC   BIT(31)
 #define __EXEC_VALIDATED   BIT(30)
@@ -191,6 +191,8 @@ struct i915_execbuffer {
struct drm_file *file; /** per-file lookup tables and limits */
struct drm_i915_gem_execbuffer2 *args; /** ioctl parameters */
struct drm_i915_gem_exec_object2 *exec; /** ioctl execobj[] */
+   struct i915_vma **vma;
+   unsigned int *flags;
 
struct intel_engine_cs *engine; /** engine to queue the request to */
struct i915_gem_context *ctx; /** context for building the request */
@@ -245,14 +247,6 @@ struct i915_execbuffer {
 };
 
 /*
- * As an alternative to creating a hashtable of handle-to-vma for a batch,
- * we used the last available reserved field in the execobject[] and stash
- * a link from the execobj to its vma.
- */
-#define __exec_to_vma(ee) (ee)->rsvd2
-#define exec_to_vma(ee) u64_to_ptr(struct i915_vma, __exec_to_vma(ee))
-
-/*
  * Used to convert any address to canonical form.
  * Starting from gen8, some commands (e.g. STATE_BASE_ADDRESS,
  * MI_LOAD_REGISTER_MEM and others, see Broadwell PRM Vol2a) require the
@@ -316,7 +310,7 @@ static bool
 eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
 const struct i915_vma *vma)
 {
-   if (!(entry->flags & __EXEC_OBJECT_HAS_PIN))
+   if (!(*vma->exec_flags & __LUT_HAS_PIN))
return true;
 
if (vma->node.size < entry->pad_to_size)
@@ -342,7 +336,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 
*entry,
 
 static inline void
 eb_pin_vma(struct i915_execbuffer *eb,
-  struct drm_i915_gem_exec_object2 *entry,
+  const struct drm_i915_gem_exec_object2 *entry,
   struct i915_vma *vma)
 {
u64 flags;
@@ -366,31 +360,28 @@ eb_pin_vma(struct i915_execbuffer *eb,
}
 
if (i915_vma_pin_fence(vma))
-   entry->flags |= __EXEC_OBJECT_HAS_FENCE;
+   *vma->exec_flags |= __LUT_HAS_FENCE;
}
 
-   entry->flags |= __EXEC_OBJECT_HAS_PIN;
+   *vma->exec_flags |= __LUT_HAS_PIN;
 }
 
-static inline void
-__eb_unreserve_vma(struct i915_vma *vma,
-  const struct drm_i915_gem_exec_object2 *entry)
+static inline void __eb_unreserve_vma(struct i915_vma *vma, unsigned int flags)
 {
-   GEM_BUG_ON(!(entry->flags & __EXEC_OBJECT_HAS_PIN));
+   GEM_BUG_ON(!(flags & __LUT_HAS_PIN));
 
-   if (unlikely(entry->flags & __EXEC_OBJECT_HAS_FENCE))
+

[Intel-gfx] [PATCH 15/24] drm/i915: Allow execbuffer to use the first object as the batch

2017-05-18 Thread Chris Wilson
Currently, the last object in the execlist is the always the batch.
However, when building the batch buffer we often know the batch object
first and if we can use the first slot in the execlist we can emit
relocation instructions relative to it immediately and avoid a separate
pass to adjust the relocations to point to the last execlist slot.

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_drv.c|  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 -
 include/uapi/drm/i915_drm.h| 18 +-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index dc8f3144ca99..178550a93884 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -351,6 +351,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
case I915_PARAM_HAS_EXEC_ASYNC:
case I915_PARAM_HAS_EXEC_FENCE:
case I915_PARAM_HAS_EXEC_CAPTURE:
+   case I915_PARAM_HAS_EXEC_BATCH_FIRST:
/* For the time being all of these are always true;
 * if some supported hardware does not have one of these
 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 07ea00ea0b55..bc2bb8b19bbf 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -650,7 +650,10 @@ ht_needs_resize(const struct i915_gem_context_vma_lut *lut)
 
 static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
 {
-   return eb->buffer_count - 1;
+   if (eb->args->flags & I915_EXEC_BATCH_FIRST)
+   return 0;
+   else
+   return eb->buffer_count - 1;
 }
 
 static int eb_select_context(struct i915_execbuffer *eb)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f24a80d2d42e..4adf3e5f5c71 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -418,6 +418,12 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_CAPTURE 45
 
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports supplying the batch buffer
+ * as the first execobject as opposed to the last. See I915_EXEC_BATCH_FIRST.
+ */
+#define I915_PARAM_HAS_EXEC_BATCH_FIRST 46
+
 typedef struct drm_i915_getparam {
__s32 param;
/*
@@ -904,7 +910,17 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_OUT(1<<17)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_OUT<<1))
+/*
+ * Traditionally the execbuf ioctl has only considered the final element in
+ * the execobject[] to be the executable batch. Often though, the client
+ * will known the batch object prior to construction and being able to place
+ * it into the execobject[] array first can simplify the relocation tracking.
+ * Setting I915_EXEC_BATCH_FIRST tells execbuf to use element 0 of the
+ * execobject[] as the * batch instead (the default is to use the last
+ * element).
+ */
+#define I915_EXEC_BATCH_FIRST  (1<<18)
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_BATCH_FIRST<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK  (0x)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 14/24] drm/i915: Wait upon userptr get-user-pages within execbuffer

2017-05-18 Thread Chris Wilson
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.

Signed-off-by: Chris Wilson 
Reviewed-by: Michał Winiarski 
---
 drivers/gpu/drm/i915/i915_drv.c|  1 +
 drivers/gpu/drm/i915/i915_drv.h| 10 +-
 drivers/gpu/drm/i915/i915_gem.c|  4 +++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_userptr.c| 18 +++---
 5 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 72fb47a439d2..dc8f3144ca99 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -553,6 +553,7 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv)
intel_uc_fini_hw(dev_priv);
i915_gem_cleanup_engines(dev_priv);
i915_gem_context_fini(dev_priv);
+   i915_gem_cleanup_userptr(dev_priv);
mutex_unlock(&dev_priv->drm.struct_mutex);
 
i915_gem_drain_freed_objects(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8ba120acf080..261de5f38956 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1452,6 +1452,13 @@ struct i915_gem_mm {
/** LRU list of objects with fence regs on them. */
struct list_head fence_list;
 
+   /**
+* Workqueue to fault in userptr pages, flushed by the execbuf
+* when required but otherwise left to userspace to try again
+* on EAGAIN.
+*/
+   struct workqueue_struct *userptr_wq;
+
u64 unordered_timeline;
 
/* the indicator for dispatch video commands on two BSD rings */
@@ -3180,7 +3187,8 @@ int i915_gem_set_tiling_ioctl(struct drm_device *dev, 
void *data,
  struct drm_file *file_priv);
 int i915_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_priv);
-void i915_gem_init_userptr(struct drm_i915_private *dev_priv);
+int i915_gem_init_userptr(struct drm_i915_private *dev_priv);
+void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
 int i915_gem_userptr_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file);
 int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4d01089a38f9..827dfe7b6937 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4782,7 +4782,9 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 */
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 
-   i915_gem_init_userptr(dev_priv);
+   ret = i915_gem_init_userptr(dev_priv);
+   if (ret)
+   goto out_unlock;
 
ret = i915_gem_init_ggtt(dev_priv);
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 171a945046c3..07ea00ea0b55 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1529,6 +1529,9 @@ static int eb_relocate_slow(struct i915_execbuffer *eb)
goto out;
}
 
+   /* A frequent cause for EAGAIN are currently unavailable client pages */
+   flush_workqueue(eb->i915->mm.userptr_wq);
+
err = i915_mutex_lock_interruptible(dev);
if (err) {
mutex_lock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 4ec9a04aa165..d89163b5d021 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -378,7 +378,7 @@ __i915_mm_struct_free(struct kref *kref)
mutex_unlock(&mm->i915->mm_lock);
 
INIT_WORK(&mm->work, __i915_mm_struct_free__worker);
-   schedule_work(&mm->work);
+   queue_work(mm->i915->mm.userptr_wq, &mm->work);
 }
 
 static void
@@ -598,7 +598,7 @@ __i915_gem_userptr_get_pages_schedule(struct 
drm_i915_gem_object *obj)
get_task_struct(work->task);
 
INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
-   schedule_work(&work->work);
+   queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work);
 
return ERR_PTR(-EAGAIN);
 }
@@ -830,8 +830,20 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, 
struct drm_file *file
return 0;
 }
 
-void i915_gem_init_userptr(struct drm_i915_private *dev_priv)
+int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
 {
mutex_init(&dev_priv->mm_lock);
hash_init(dev_priv->mm_structs);
+
+   dev_priv->mm.userptr_wq =
+   alloc_workqueue("i915-userptr-acquire", WQ_HIGHPRI, 0);
+   if (!dev_priv->mm.userptr_wq)
+   

[Intel-gfx] [PATCH 17/24] drm/i915: Stash a pointer to the obj's resv in the vma

2017-05-18 Thread Chris Wilson
During execbuf, a mandatory step is that we add this request (this
fence) to each object's reservation_object. Inside execbuf, we track the
vma, and to add the fence to the reservation_object then means having to
first chase the obj, incurring another cache miss. We can reduce the
 number of cache misses by stashing a pointer to the reservation_object
in the vma itself.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 -
 drivers/gpu/drm/i915/i915_vma.c|  1 +
 drivers/gpu/drm/i915/i915_vma.h|  3 ++-
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 9715099664ac..f5a0e419bfec 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1207,17 +1207,17 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
if (err)
goto err_request;
 
-   GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv, true));
+   GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true));
i915_vma_move_to_active(batch, rq, 0);
-   reservation_object_lock(obj->resv, NULL);
-   reservation_object_add_excl_fence(obj->resv, &rq->fence);
-   reservation_object_unlock(obj->resv);
+   reservation_object_lock(batch->resv, NULL);
+   reservation_object_add_excl_fence(batch->resv, &rq->fence);
+   reservation_object_unlock(batch->resv);
i915_vma_unpin(batch);
 
i915_vma_move_to_active(vma, rq, true);
-   reservation_object_lock(vma->obj->resv, NULL);
-   reservation_object_add_excl_fence(vma->obj->resv, &rq->fence);
-   reservation_object_unlock(vma->obj->resv);
+   reservation_object_lock(vma->resv, NULL);
+   reservation_object_add_excl_fence(vma->resv, &rq->fence);
+   reservation_object_unlock(vma->resv);
 
rq->batch = batch;
 
@@ -1267,7 +1267,6 @@ relocate_entry(struct i915_vma *vma,
   struct i915_execbuffer *eb,
   const struct i915_vma *target)
 {
-   struct drm_i915_gem_object *obj = vma->obj;
u64 offset = reloc->offset;
u64 target_offset = relocation_target(reloc, target);
bool wide = eb->reloc_cache.use_64bit_reloc;
@@ -1275,7 +1274,7 @@ relocate_entry(struct i915_vma *vma,
 
if (!eb->reloc_cache.vaddr &&
(DBG_FORCE_RELOC == FORCE_GPU_RELOC ||
-!reservation_object_test_signaled_rcu(obj->resv, true))) {
+!reservation_object_test_signaled_rcu(vma->resv, true))) {
const unsigned int gen = eb->reloc_cache.gen;
unsigned int len;
u32 *batch;
@@ -1335,7 +1334,7 @@ relocate_entry(struct i915_vma *vma,
}
 
 repeat:
-   vaddr = reloc_vaddr(obj, &eb->reloc_cache, offset >> PAGE_SHIFT);
+   vaddr = reloc_vaddr(vma->obj, &eb->reloc_cache, offset >> PAGE_SHIFT);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
 
@@ -1802,11 +1801,11 @@ static int eb_relocate_slow(struct i915_execbuffer *eb)
return err ?: have_copy;
 }
 
-static void eb_export_fence(struct drm_i915_gem_object *obj,
+static void eb_export_fence(struct i915_vma *vma,
struct drm_i915_gem_request *req,
unsigned int flags)
 {
-   struct reservation_object *resv = obj->resv;
+   struct reservation_object *resv = vma->resv;
 
/*
 * Ignore errors from failing to allocate the new fence, we can't
@@ -1866,7 +1865,7 @@ eb_move_to_gpu(struct i915_execbuffer *eb)
const struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
struct i915_vma *vma = exec_to_vma(entry);
 
-   eb_export_fence(vma->obj, eb->request, entry->flags);
+   eb_export_fence(vma, eb->request, entry->flags);
if (unlikely(entry->flags & __EXEC_OBJECT_HAS_REF))
i915_vma_put(vma);
}
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 09b00e9de2f0..fc1a8412b0a0 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -90,6 +90,7 @@ vma_create(struct drm_i915_gem_object *obj,
init_request_active(&vma->last_fence, NULL);
vma->vm = vm;
vma->obj = obj;
+   vma->resv = obj->resv;
vma->size = obj->base.size;
vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 062addfee6ef..3840b8cdc6b1 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -50,6 +50,7 @@ struct i915_vma {
struct drm_i915_gem_object *obj;
struct i915_address_space *vm;
struct drm_i915_fence_reg *fence;
+   struct reservation_object *resv;
struct sg_table *pages;
void __iomem *ioma

[Intel-gfx] [PATCH 02/24] drm/i915: Mark CPU cache as dirty on every transition for CPU writes

2017-05-18 Thread Chris Wilson
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.

The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).

v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.

v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.

Reported-by: Dongwon Kim 
Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning 
the scanout")
Signed-off-by: Chris Wilson 
Cc: Dongwon Kim 
Cc: Matt Roper 
Tested-by: Dongwon Kim 
---
 drivers/gpu/drm/i915/i915_gem.c  | 76 ++--
 drivers/gpu/drm/i915/i915_gem_clflush.c  | 15 +++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 21 +++
 drivers/gpu/drm/i915/i915_gem_internal.c |  3 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c  |  5 +-
 drivers/gpu/drm/i915/selftests/huge_gem_object.c |  3 +-
 6 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 02adf8241394..155dd52f2d18 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -49,7 +49,7 @@ static void i915_gem_flush_free_objects(struct 
drm_i915_private *i915);
 
 static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
-   if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
+   if (obj->cache_dirty)
return false;
 
if (!i915_gem_object_is_coherent(obj))
@@ -233,6 +233,14 @@ i915_gem_object_get_pages_phys(struct drm_i915_gem_object 
*obj)
return st;
 }
 
+static void __start_cpu_write(struct drm_i915_gem_object *obj)
+{
+   obj->base.read_domains = I915_GEM_DOMAIN_CPU;
+   obj->base.write_domain = I915_GEM_DOMAIN_CPU;
+   if (cpu_write_needs_clflush(obj))
+   obj->cache_dirty = true;
+}
+
 static void
 __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
struct sg_table *pages,
@@ -248,8 +256,7 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object 
*obj,
!i915_gem_object_is_coherent(obj))
drm_clflush_sg(pages);
 
-   obj->base.read_domains = I915_GEM_DOMAIN_CPU;
-   obj->base.write_domain = I915_GEM_DOMAIN_CPU;
+   __start_cpu_write(obj);
 }
 
 static void
@@ -684,6 +691,12 @@ i915_gem_dumb_create(struct drm_file *file,
   args->size, &args->handle);
 }
 
+static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
+{
+   return !(obj->cache_level == I915_CACHE_NONE ||
+obj->cache_level == I915_CACHE_WT);
+}
+
 /**
  * Creates a new mm object and returns a handle to it.
  * @dev: drm device pointer
@@ -753,6 +766,11 @@ flush_write_domain(struct drm_i915_gem_object *obj, 
unsigned int flush_domains)
case I915_GEM_DOMAIN_CPU:
i915_gem_clflush_object(obj, I915_CLFLUSH_SYNC);
break;
+
+   case I915_GEM_DOMAIN_RENDER:
+   if (gpu_write_needs_clflush(obj))
+   obj->cache_dirty = true;
+   break;
}
 
obj->base.write_domain = 0;
@@ -854,7 +872,8 @@ int i915_gem_obj_prepare_shmem_read(struct 
drm_i915_gem_object *obj,
 * optimizes for the case when the gpu will dirty the data
 * anyway again before the next pread happens.
 */
-   if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
+   if (!obj->cache_dirty &&
+   !(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
*needs_clflush = CLFLUSH_BEFORE;
 
 out:
@@ -906,14 +925,16 @@ int i915_gem_obj_prepare_shmem_write(struct 
drm_i915_gem_object *obj,
 * This optimizes for the case when the gpu will use the data
 * right away and we therefore have to clflush anyway.
 */
-   if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
+   if (!obj->cache_dirty) {
*needs_clflush |= CLFLUSH_AFTER;
 
-   /* Same trick applies to invalidate partially written cachelines read
-* before writing.
-*/
-   if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
-   *needs_clflush |= CLFLUSH_BEFORE;
+   /*
+* Same trick applies to invalidate partially written
+* cachelines read before writing.
+*/
+   if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
+   *needs_clflush |= CLF

[Intel-gfx] [PATCH 12/24] drm/i915: Store a persistent reference for an object in the execbuffer cache

2017-05-18 Thread Chris Wilson
If we take a reference to the object/vma when it is first used in an
execbuf, we can keep that reference until the object's file-local handle
is closed. Thereby saving a frequent ref/unref pair.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem_context.c|  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 32 --
 drivers/gpu/drm/i915/i915_vma.c|  2 ++
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 9af443e69c57..c62e17a68f5b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -152,6 +152,7 @@ static void vma_lut_free(struct i915_gem_context *ctx)
hlist_for_each_entry(vma, &lut->ht[i], ctx_node) {
vma->obj->vma_hashed = NULL;
vma->ctx = NULL;
+   i915_vma_put(vma);
}
}
kvfree(lut->ht);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 77c7363754d1..92616c96a8c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -42,11 +42,12 @@
 
 #define DBG_USE_CPU_RELOC 0 /* -1 force GTT relocs; 1 force CPU relocs */
 
-#define  __EXEC_OBJECT_HAS_PIN BIT(31)
-#define  __EXEC_OBJECT_HAS_FENCE   BIT(30)
-#define  __EXEC_OBJECT_NEEDS_MAP   BIT(29)
-#define  __EXEC_OBJECT_NEEDS_BIAS  BIT(28)
-#define  __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */
+#define  __EXEC_OBJECT_HAS_REF BIT(31)
+#define  __EXEC_OBJECT_HAS_PIN BIT(30)
+#define  __EXEC_OBJECT_HAS_FENCE   BIT(29)
+#define  __EXEC_OBJECT_NEEDS_MAP   BIT(28)
+#define  __EXEC_OBJECT_NEEDS_BIAS  BIT(27)
+#define  __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
 #define __EB_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
 
 #define __EXEC_HAS_RELOC   BIT(31)
@@ -445,7 +446,7 @@ eb_add_vma(struct i915_execbuffer *eb,
 * to find the right target VMA when doing relocations.
 */
vma->exec_entry = entry;
-   __exec_to_vma(entry) = (uintptr_t)i915_vma_get(vma);
+   __exec_to_vma(entry) = (uintptr_t)vma;
 
if (eb->lut_size >= 0) {
vma->exec_handle = entry->handle;
@@ -774,11 +775,19 @@ next_vma: ;
GEM_BUG_ON(obj->vma_hashed);
obj->vma_hashed = vma;
}
+
+   i915_vma_get(vma);
}
 
err = eb_add_vma(eb, &eb->exec[i], vma);
if (unlikely(err))
goto err;
+
+   /* Only after we validated the user didn't use our bits */
+   if (vma->ctx != eb->ctx) {
+   i915_vma_get(vma);
+   eb->exec[i].flags |= __EXEC_OBJECT_HAS_REF;
+   }
}
 
if (lut->ht_size & I915_CTX_RESIZE_IN_PROGRESS) {
@@ -851,7 +860,8 @@ static void eb_reset_vmas(const struct i915_execbuffer *eb)
 
eb_unreserve_vma(vma, entry);
vma->exec_entry = NULL;
-   i915_vma_put(vma);
+   if (unlikely(entry->flags & __EXEC_OBJECT_HAS_REF))
+   i915_vma_put(vma);
}
 
if (eb->lut_size >= 0)
@@ -878,7 +888,8 @@ static void eb_release_vmas(const struct i915_execbuffer 
*eb)
if (entry->flags & __EXEC_OBJECT_HAS_PIN)
__eb_unreserve_vma(vma, entry);
vma->exec_entry = NULL;
-   i915_vma_put(vma);
+   if (unlikely(entry->flags & __EXEC_OBJECT_HAS_REF))
+   i915_vma_put(vma);
}
 }
 
@@ -1636,7 +1647,8 @@ eb_move_to_gpu(struct i915_execbuffer *eb)
struct i915_vma *vma = exec_to_vma(entry);
 
eb_export_fence(vma->obj, eb->request, entry->flags);
-   i915_vma_put(vma);
+   if (unlikely(entry->flags & __EXEC_OBJECT_HAS_REF))
+   i915_vma_put(vma);
}
eb->exec = NULL;
 
@@ -1767,7 +1779,7 @@ static struct i915_vma *eb_parse(struct i915_execbuffer 
*eb, bool is_master)
vma->exec_entry =
memset(&eb->exec[eb->buffer_count++],
   0, sizeof(*vma->exec_entry));
-   vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
+   vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_REF;
__exec_to_vma(vma->exec_entry) = (uintptr_t)i915_vma_get(vma);
 
 out:
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 6b1253fdfc39..09b00e9de2f0 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -605,6 +605,8 @@ void i915_vma_unlink_ctx(struct i915_vma *vma)
if (i915_vma_is_ggtt(vma))
  

[Intel-gfx] [PATCH 03/24] drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty

2017-05-18 Thread Chris Wilson
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.

Signed-off-by: Chris Wilson 
Cc: Dongwon Kim 
Cc: Matt Roper 
---
 drivers/gpu/drm/i915/i915_gem.c  | 14 +++---
 drivers/gpu/drm/i915/i915_gem_clflush.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_internal.c |  3 ++-
 drivers/gpu/drm/i915/i915_gem_object.h   |  1 +
 drivers/gpu/drm/i915/i915_gem_stolen.c   |  1 +
 drivers/gpu/drm/i915/i915_gem_userptr.c  |  3 ++-
 drivers/gpu/drm/i915/selftests/huge_gem_object.c |  3 ++-
 8 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 155dd52f2d18..870659c13de3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -52,7 +52,7 @@ static bool cpu_write_needs_clflush(struct 
drm_i915_gem_object *obj)
if (obj->cache_dirty)
return false;
 
-   if (!i915_gem_object_is_coherent(obj))
+   if (!obj->cache_coherent)
return true;
 
return obj->pin_display;
@@ -253,7 +253,7 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object 
*obj,
 
if (needs_clflush &&
(obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0 &&
-   !i915_gem_object_is_coherent(obj))
+   !obj->cache_coherent)
drm_clflush_sg(pages);
 
__start_cpu_write(obj);
@@ -856,8 +856,7 @@ int i915_gem_obj_prepare_shmem_read(struct 
drm_i915_gem_object *obj,
if (ret)
return ret;
 
-   if (i915_gem_object_is_coherent(obj) ||
-   !static_cpu_has(X86_FEATURE_CLFLUSH)) {
+   if (obj->cache_coherent || !static_cpu_has(X86_FEATURE_CLFLUSH)) {
ret = i915_gem_object_set_to_cpu_domain(obj, false);
if (ret)
goto err_unpin;
@@ -909,8 +908,7 @@ int i915_gem_obj_prepare_shmem_write(struct 
drm_i915_gem_object *obj,
if (ret)
return ret;
 
-   if (i915_gem_object_is_coherent(obj) ||
-   !static_cpu_has(X86_FEATURE_CLFLUSH)) {
+   if (obj->cache_coherent || !static_cpu_has(X86_FEATURE_CLFLUSH)) {
ret = i915_gem_object_set_to_cpu_domain(obj, true);
if (ret)
goto err_unpin;
@@ -3661,6 +3659,7 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
list_for_each_entry(vma, &obj->vma_list, obj_link)
vma->node.color = cache_level;
obj->cache_level = cache_level;
+   obj->cache_coherent = i915_gem_object_is_coherent(obj);
obj->cache_dirty = true; /* Always invalidate stale cachelines */
 
return 0;
@@ -4320,7 +4319,8 @@ i915_gem_object_create(struct drm_i915_private *dev_priv, 
u64 size)
} else
obj->cache_level = I915_CACHE_NONE;
 
-   obj->cache_dirty = !i915_gem_object_is_coherent(obj);
+   obj->cache_coherent = i915_gem_object_is_coherent(obj);
+   obj->cache_dirty = !obj->cache_coherent;
 
trace_i915_gem_object_create(obj);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_clflush.c 
b/drivers/gpu/drm/i915/i915_gem_clflush.c
index 17b207e963c2..152f16c11878 100644
--- a/drivers/gpu/drm/i915/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/i915_gem_clflush.c
@@ -139,7 +139,7 @@ void i915_gem_clflush_object(struct drm_i915_gem_object 
*obj,
 * snooping behaviour occurs naturally as the result of our domain
 * tracking.
 */
-   if (!(flags & I915_CLFLUSH_FORCE) && i915_gem_object_is_coherent(obj))
+   if (!(flags & I915_CLFLUSH_FORCE) && obj->cache_coherent)
return;
 
trace_i915_gem_object_clflush(obj);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0b8ae0f56675..2e5f513087a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1129,7 +1129,7 @@ i915_gem_execbuffer_move_to_gpu(struct 
drm_i915_gem_request *req,
if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC)
continue;
 
-   if (obj->cache_dirty)
+   if (obj->cache_dirty & ~obj->cache_coherent)
i915_gem_clflush_object(obj, 0);
 
ret = i915_gem_request_await_object
diff --git a/drivers/gpu/drm/i915/i915_gem_internal.c 
b/drivers/gpu/drm/i915/i915_gem_internal.c
index 58e93e87d573..568bf83af1f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/i915_gem_internal.c
@@ -191,7 +191,8 @@ i915_gem_object_create_internal(struct drm_i915_private 
*i915,
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
obj->cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE

Re: [Intel-gfx] [PATCH v3] drm: Add DRM_MODE_ROTATE_ and DRM_MODE_REFLECT_ to UAPI

2017-05-18 Thread Emil Velikov
Hi Rob,

On 18 May 2017 at 02:39, Robert Foss  wrote:
> Add DRM_MODE_ROTATE_ and DRM_MODE_REFLECT_ defines to the UAPI
> as a convenience.
>
> Ideally the DRM_ROTATE_ and DRM_REFLECT_ property ids are looked up
> through the atomic API, but realizing that userspace is likely to take
> shortcuts and assume that the enum values are what is sent over the
> wire.
>
> As a result these defines are provided purely as a convenience to
> userspace applications.
>
> Signed-off-by: Robert Foss 
> ---
> Changes since v2:
>  - Changed define prefix from DRM_MODE_PROP_ to DRM_MODE_
>  - Fix compilation errors
>  - Changed comment formatting
>  - Deduplicated comment lines
>  - Clarified DRM_MODE_PROP_REFLECT_ comment
>
> Changes since v1:
>  - Moved defines from drm.h to drm_mode.h
>  - Changed define prefix from DRM_ to DRM_MODE_PROP_
>  - Updated uses of the defines to the new prefix
>  - Removed include from drm_rect.c
>  - Stopped using the BIT() macro
>
Reviewed-by: Emil Velikov 

-Emil
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 2/3] drm: Create a format/modifier blob

2017-05-18 Thread Ville Syrjälä
On Wed, May 17, 2017 at 05:46:18PM -0700, Ben Widawsky wrote:
> On 17-05-17 01:06:16, Emil Velikov wrote:
> >Hi Ben,
> >
> >On 16 May 2017 at 22:31, Ben Widawsky  wrote:
> >> Updated blob layout (Rob, Daniel, Kristian, xerpi)
> >>
> >> v2:
> >> * Removed __packed, and alignment (.+)
> >> * Fix indent in drm_format_modifier fields (Liviu)
> >> * Remove duplicated modifier > 64 check (Liviu)
> >> * Change comment about modifier (Liviu)
> >> * Remove arguments to blob creation, use plane instead (Liviu)
> >> * Fix data types (Ben)
> >> * Make the blob part of uapi (Daniel)
> >>
> >> Cc: Rob Clark 
> >> Cc: Daniel Stone 
> >> Cc: Kristian H. Kristensen 
> >> Cc: Liviu Dudau 
> >> Signed-off-by: Ben Widawsky 
> >> Reviewed-by: Daniel Stone 
> >
> >I think this is almost perfect, barring the UAPI nitpick.
> >The rest is somewhat of a bikeshedding.
> >
> >With the UAPI resolved, regardless of the rest
> >Reviewed-by: Emil Velikov 
> >
> >
> >> --- a/drivers/gpu/drm/drm_plane.c
> >> +++ b/drivers/gpu/drm/drm_plane.c
> >
> >> +static int create_in_format_blob(struct drm_device *dev, struct drm_plane 
> >> *plane)
> >> +{
> >> +   const struct drm_mode_config *config = &dev->mode_config;
> >> +   const uint64_t *temp_modifiers = plane->modifiers;
> >> +   unsigned int format_modifier_count = 0;
> >> +   struct drm_property_blob *blob = NULL;
> >> +   struct drm_format_modifier *mod;
> >> +   size_t blob_size = 0, formats_size, modifiers_size;
> >There's no need to initialize blob and blob_size here.
> >
> >> +   struct drm_format_modifier_blob *blob_data;
> >> +   int i, j, ret = 0;
> >Make i and j unsigned to match format_modifier_count and
> >plane->format_count respectively.
> >Then expand ret in the only place where it's used?
> >
> 
> Oh. ret has lost it's utility over the iterations of this patch. Make i and j
> unsigned and dropped ret.

Unsigned loop iterators will likely bite someone some day, especially
if they're called something like 'i', 'j', etc. IMO it's best to keep
them signed.

> 
> >> +
> >> +   if (plane->modifiers)
> >> +   while (*temp_modifiers++ != DRM_FORMAT_MOD_INVALID)
> >> +   format_modifier_count++;
> >> +
> >> +   formats_size = sizeof(*plane->format_types) * plane->format_count;
> >> +   if (WARN_ON(!formats_size)) {
> >> +   /* 0 formats are never expected */
> >> +   return 0;
> >> +   }
> >> +
> >> +   modifiers_size =
> >> +   sizeof(struct drm_format_modifier) * format_modifier_count;
> >> +
> >> +   blob_size = sizeof(struct drm_format_modifier_blob);
> >> +   blob_size += ALIGN(formats_size, 8);
> >Worth having the "Modifiers offset is a pointer..." comment moved/copied 
> >here?
> >
> 
> Yes it is.
> 
> >> +   blob_size += modifiers_size;
> >> +
> >> +   blob = drm_property_create_blob(dev, blob_size, NULL);
> >> +   if (IS_ERR(blob))
> >> +   return -1;
> >> +
> >Maybe propagate the exact error... Hmm we don't seem to check if
> >create_in_format_blob fails either so perhaps it's not worth it.
> >
> >
> 
> In this case it's almost definitely ENOMEM. All values should be verified - I
> think the other errors are there for when userspace is utilizing blob 
> creation.
> 
> So I'll just leave it.
> 
> >> --- a/include/uapi/drm/drm_mode.h
> >> +++ b/include/uapi/drm/drm_mode.h
> >> @@ -665,6 +665,56 @@ struct drm_mode_atomic {
> >> __u64 user_data;
> >>  };
> >>
> >> +struct drm_format_modifier_blob {
> >> +#define FORMAT_BLOB_CURRENT 1
> >> +   /* Version of this blob format */
> >> +   u32 version;
> >> +
> >> +   /* Flags */
> >> +   u32 flags;
> >> +
> >> +   /* Number of fourcc formats supported */
> >> +   u32 count_formats;
> >> +
> >> +   /* Where in this blob the formats exist (in bytes) */
> >> +   u32 formats_offset;
> >> +
> >> +   /* Number of drm_format_modifiers */
> >> +   u32 count_modifiers;
> >> +
> >> +   /* Where in this blob the modifiers exist (in bytes) */
> >> +   u32 modifiers_offset;
> >> +
> >> +   /* u32 formats[] */
> >> +   /* struct drm_format_modifier modifiers[] */
> >> +};
> >> +
> >> +struct drm_format_modifier {
> >> +   /* Bitmask of formats in get_plane format list this info applies 
> >> to. The
> >> +* offset allows a sliding window of which 64 formats (bits).
> >> +*
> >> +* Some examples:
> >> +* In today's world with < 65 formats, and formats 0, and 2 are
> >> +* supported
> >> +* 0x0005
> >> +*^-offset = 0, formats = 5
> >> +*
> >> +* If the number formats grew to 128, and formats 98-102 are
> >> +* supported with the modifier:
> G>> +*
> >> +* 0x003c 
> >> +*^
> >> +*|__offset = 64, formats = 0x3c
> >> +*
>

[Intel-gfx] [BACKPORT v4.12-rc1] drm/i915: Do not sync RCU during shrinking

2017-05-18 Thread Joonas Lahtinen
Due to the complex dependencies between workqueues and RCU, which
are not easily detected by lockdep, do not synchronize RCU during
shrinking.

On low-on-memory systems (mem=1G for example), the RCU sync leads
to all system workqueus freezing and unrelated lockdep splats are
displayed according to reports. GIT bisecting done by J. R.
Okajima points to the commit where RCU syncing was extended.

RCU sync gains us very little benefit in real life scenarios
where the amount of memory used by object backing storage is
dominant over the metadata under RCU, so drop it altogether.

 " Yeeeaah, if core could just, go ahead and reclaim RCU
   queues, that'd be great. "

  - Chris Wilson, 2016 (0eafec6d3244)

v2: More information to commit message.
v3: Remove "grep _rcu_" escapee from i915_gem_shrink_all (Andrea)

Fixes: c053b5a506d3 ("drm/i915: Don't call synchronize_rcu_expedited under 
struct_mutex")
Suggested-by: Chris Wilson 
Reported-by: J. R. Okajima 
Signed-off-by: Joonas Lahtinen 
Reviewed-by: Chris Wilson 
Tested-by: Hugh Dickins 
Tested-by: Andrea Arcangeli 
Cc: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: J. R. Okajima 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Jani Nikula 
Cc:  # v4.11+
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 129ed30..57d9f7f 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -59,9 +59,6 @@ static void i915_gem_shrinker_unlock(struct drm_device *dev, 
bool unlock)
return;
 
mutex_unlock(&dev->struct_mutex);
-
-   /* expedite the RCU grace period to free some request slabs */
-   synchronize_rcu_expedited();
 }
 
 static bool any_vma_pinned(struct drm_i915_gem_object *obj)
@@ -274,8 +271,6 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private 
*dev_priv)
I915_SHRINK_ACTIVE);
intel_runtime_pm_put(dev_priv);
 
-   synchronize_rcu(); /* wait for our earlier RCU delayed slab frees */
-
return freed;
 }
 
-- 
2.7.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 1/3] drm: Plumb modifiers through plane init

2017-05-18 Thread Liviu Dudau
On Wed, May 17, 2017 at 05:26:14PM -0700, Ben Widawsky wrote:
> On 17-05-17 11:17:57, Liviu Dudau wrote:
> > On Tue, May 16, 2017 at 02:31:24PM -0700, Ben Widawsky wrote:
> > > This is the plumbing for supporting fb modifiers on planes. Modifiers
> > > have already been introduced to some extent, but this series will extend
> > > this to allow querying modifiers per plane. Based on this, the client to
> > > enable optimal modifications for framebuffers.
> > > 
> > > This patch simply allows the DRM drivers to initialize their list of
> > > supported modifiers upon initializing the plane.
> > > 
> > > v2: A minor addition from Daniel
> > > 
> > > v3: Updated commit message
> > > s/INVALID/DRM_FORMAT_MOD_INVALID (Liviu)
> > > Remove some excess newlines (Liviu)
> > > Update comment for > 64 modifiers (Liviu)
> > > 
> > > Cc: Liviu Dudau 
> > > Reviewed-by: Daniel Stone  (v2)
> > > Signed-off-by: Ben Widawsky 
> > 
> > Minor nits, see below, but otherwise:
> > 
> > Reviewed-by: Liviu Dudau 
> > 
> > Thanks,
> > Liviu
> > 
> 
> Thank you. I took them all but the memcpy change, which I'm pretty sure is 
> okay.
> Take a quick look again and let me know.

[snip]
> 
> > > +  * format is encoded as a bit and the current code only supports a u64.
> > > +  */
> > > + if (WARN_ON(format_count > 64))
> > > + return -EINVAL;
> > > +
> > > + if (format_modifiers) {
> > > + const uint64_t *temp_modifiers = format_modifiers;
> > > + while (*temp_modifiers++ != DRM_FORMAT_MOD_INVALID)
> > > + format_modifier_count++;
> > > + }
> > > +
> > > + plane->modifier_count = format_modifier_count;
> > > + plane->modifiers = kmalloc_array(format_modifier_count,
> > > +  sizeof(format_modifiers[0]),
> > > +  GFP_KERNEL);
> > > +
> > > + if (format_modifier_count && !plane->modifiers) {
> > > + DRM_DEBUG_KMS("out of memory when allocating plane\n");
> > > + kfree(plane->format_types);
> > > + drm_mode_object_unregister(dev, &plane->base);
> > > + return -ENOMEM;
> > > + }
> > > +
> > >   if (name) {
> > >   va_list ap;
> > > 
> > > @@ -117,12 +145,15 @@ int drm_universal_plane_init(struct drm_device 
> > > *dev, struct drm_plane *plane,
> > >   }
> > >   if (!plane->name) {
> > >   kfree(plane->format_types);
> > > + kfree(plane->modifiers);
> > >   drm_mode_object_unregister(dev, &plane->base);
> > >   return -ENOMEM;
> > >   }
> > > 
> > >   memcpy(plane->format_types, formats, format_count * sizeof(uint32_t));
> > >   plane->format_count = format_count;
> > > + memcpy(plane->modifiers, format_modifiers,
> > > +format_modifier_count * sizeof(format_modifiers[0]));
> > 
> > I'm still worried that we can reach the memcpy with a NULL format_modifiers 
> > and we are opening
> > a security hole here.
> > 
> 
> I didn't notice your feedback here before, I apologize. I'm pretty certain you
> can't get here with !format_modifiers (well you can, but then the 'n' for 
> memcpy
> will be 0). format_modifier_count is only !0 if format_modifiers is !NULL.

That is a valid point. It then makes me ask why we even go through the dance of 
allocating
a 0 array for plane->modifiers, can we not skip the whole thing around 
plane->modifiers if
format_modifiers is NULL?

[snip]

> -- 
> Ben Widawsky, Intel Open Source Technology Center

Thanks for the effort,
Liviu


-- 

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4] drm/i915/gvt: return the correct usable aperture size under gvt environment

2017-05-18 Thread Joonas Lahtinen
On pe, 2017-05-12 at 03:20 +, Li, Weinan Z wrote:
> 
> Thanks Joonas' guidance.
> I add 4 labels in intel_vgt_balloon for cleaning up ballooned space, the 
> reserved size
> will increase when one balloon space reserve success, and will clean up if 
> anyone reserve
> fail step by step.

Yes, there could still be vgt_deballoon_space function to further
reduce code duplication. The labels should begin with err_, like
elsewhere in in the driver.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2 00/13] Fix IGTs for Android

2017-05-18 Thread Petri Latvala

Patches 1-4 and 6-13 are

Reviewed-by: Petri Latvala 

Nitpicks on the commit message on patch 6 (replied to it).



--
Petri Latvala



On Tue, May 16, 2017 at 03:24:49PM +0200, Arkadiusz Hiler wrote:
> IGTs are broken for Android since the introduction of dependency on procps. 
> Over
> time other incompatibilities built up.
> 
> I took the liberty to fix some of the issues, workaround couple of others and
> blacklist heavily incompatible tools/tests.
> 
> It builds on (almost) vanilla AOSP now.
> 
> Github: 
> Howto:  
> now including docker exmaple / script for your convenience
> 
> DEP1:  - just builds Google's one
> DEP2: 
> 
> We should include a note on Android compatibility in the README and do
> "continuous compilation" of the patches as they arrive on the mailing list,
> otherwise this **will get broken again soon**.
> 
> This is long as it is, but not complete yet.
> 
> Here are some of more obvious TODOs:
>  * introduce something like IGT_HAS_CAIRO define for convenience
>  * revise igt_kms dependency on cairo and enable everything what is 
> independent
>  * revise kms tests and do the above
>  * review all things that are disabled on Android and try to enable them
>  * do something less ugly with config.h generation on Android
> 
> v2: couple of suggested fixed + everything that was not compatible since last
> patch
> 
> Cc: Petri Latvala 
> Cc: Antonio Argenziano 
> Cc: Vinay Belgaumkar 
> Cc: Chris Wilson 
> Cc: Robert Foss 
> 
> 
> Arkadiusz Hiler (13):
>   tests/drm_import_export: Include {i915_,}drm.h properly
>   Make conditions on HAVE_UDEV consistent
>   lib/igt_aux: Include unistd.h for gettid() on Android
>   lib/igt_aux: Make procps optional
>   chamelium: Fix build issues on Android
>   tools/Android.mk: Add guc_logger and l3_parity skip list
>   tests/Android.mk: Add perf to skip list
>   Android.mk: Fix libkmod use
>   Android.mk: Filter out *.h from src files
>   Android.mk: Use drm stubs
>   tools/Android.mk: Fix zlib inclusion
>   tests/gem_exec_nop: Disable headless subtest on cairoless Android
>   tests/gem_exec_nop: Rename signal() to fence_signal()
> 
>  benchmarks/Android.mk  |  5 -
>  configure.ac   |  6 +-
>  demos/Android.mk   |  3 ++-
>  lib/Android.mk |  7 ---
>  lib/Makefile.am|  7 +++
>  lib/Makefile.sources   |  7 ---
>  lib/igt.h  |  2 ++
>  lib/igt_aux.c  | 37 +
>  lib/igt_aux.h  |  5 +
>  lib/igt_chamelium.h|  3 +++
>  lib/igt_kmod.h |  4 
>  lib/tests/Android.mk   |  4 ++--
>  tests/Android.mk   |  9 +
>  tests/Makefile.am  |  6 ++
>  tests/Makefile.sources |  6 --
>  tests/drm_import_export.c  |  4 ++--
>  tests/gem_exec_nop.c   | 14 +-
>  tests/testdisplay_hotplug.c|  2 +-
>  tools/Android.mk   | 14 +-
>  tools/intel_l3_parity.c|  2 +-
>  tools/intel_l3_parity.h|  2 +-
>  tools/intel_l3_udev_listener.c |  2 +-
>  22 files changed, 106 insertions(+), 45 deletions(-)
> 
> -- 
> 2.9.3
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 06/13] tools/Android.mk: Add guc_logger and l3_parity skip list

2017-05-18 Thread Petri Latvala

On Tue, May 16, 2017 at 03:24:55PM +0200, Arkadiusz Hiler wrote:
> Those tools does not build on Android due to "Linux-only" dependencies.

s/does/do/


"to" missing in title.


--
Petri Latvala



> 
> Let's blacklist them for now.
> 
> Signed-off-by: Arkadiusz Hiler 
> ---
>  tools/Android.mk | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/Android.mk b/tools/Android.mk
> index 6cdedeb..0602e8c 100644
> --- a/tools/Android.mk
> +++ b/tools/Android.mk
> @@ -59,6 +59,8 @@ bin_PROGRAMS := $(tools_prog_lists)
>  
>  skip_tools_list := \
>  intel_framebuffer_dump \
> +intel_guc_logger \
> +intel_l3_parity \
>  intel_reg_dumper \
>  intel_vga_read \
>  intel_vga_write
> -- 
> 2.9.3
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/12] drm/i915/skl: New ddb allocation algorithm

2017-05-18 Thread Mahesh Kumar
From: "Kumar, Mahesh" 

This patch implements new DDB allocation algorithm as per HW team
recommendation. This algo takecare of scenario where we allocate less DDB
for the planes with lower relative pixel rate, but they require more DDB
to work.
It also takes care of enabling same watermark level for each
plane in crtc, for efficient power saving.

Changes since v1:
 - Rebase on top of Paulo's patch series

Changes since v2:
 - Fix the for loop condition to enable WM

Changes since v3:
 - Fix crash in cursor i-g-t reported by Maarten
 - Rebase after addressing Paulo's comments
 - Few other ULT fixes
Changes since v4:
 - Rebase on drm-tip
 - Added separate function to enable WM levels
Changes since v5:
 - Fix a crash identified in skl-6770HQ system
Changes since v6:
 - Address review comments from Matt
Changes since v7:
 - Fix failure return in skl_compute_plane_wm (Matt)
 - fix typo

Signed-off-by: Mahesh Kumar 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_pm.c | 256 
 1 file changed, 156 insertions(+), 100 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 936eef1634c7..a6a4e87c7ca6 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4039,13 +4039,41 @@ skl_ddb_calc_min(const struct intel_crtc_state *cstate, 
int num_active,
minimum[PLANE_CURSOR] = skl_cursor_allocation(num_active);
 }
 
+static void
+skl_enable_plane_wm_levels(const struct drm_i915_private *dev_priv,
+  uint16_t plane_ddb,
+  uint16_t max_level,
+  struct skl_plane_wm *wm)
+{
+   int level;
+   /*
+* Now enable all levels in WM structure which can be enabled
+* using current DDB allocation
+*/
+   for (level = ilk_wm_max_level(dev_priv); level >= 0; level--) {
+   struct skl_wm_level *level_wm = &wm->wm[level];
+
+   if (level > max_level || level_wm->plane_res_b == 0
+ || level_wm->plane_res_l >= 31
+ || level_wm->plane_res_b >= plane_ddb) {
+   level_wm->plane_en = false;
+   level_wm->plane_res_b = 0;
+   level_wm->plane_res_l = 0;
+   } else {
+   level_wm->plane_en = true;
+   }
+   }
+}
+
 static int
 skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
+ struct skl_pipe_wm *pipe_wm,
  struct skl_ddb_allocation *ddb /* out */)
 {
struct drm_atomic_state *state = cstate->base.state;
struct drm_crtc *crtc = cstate->base.crtc;
struct drm_device *dev = crtc->dev;
+   struct drm_i915_private *dev_priv = to_i915(dev);
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
enum pipe pipe = intel_crtc->pipe;
struct skl_ddb_entry *alloc = &cstate->wm.skl.ddb;
@@ -4058,6 +4086,8 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
unsigned plane_data_rate[I915_MAX_PLANES] = {};
unsigned plane_y_data_rate[I915_MAX_PLANES] = {};
uint16_t total_min_blocks = 0;
+   uint16_t total_level_ddb;
+   int max_level, level;
 
/* Clear the partitioning for disabled planes. */
memset(ddb->plane[pipe], 0, sizeof(ddb->plane[pipe]));
@@ -4096,10 +4126,48 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
return -EINVAL;
}
 
-   alloc_size -= total_min_blocks;
-   ddb->plane[pipe][PLANE_CURSOR].start = alloc->end - 
minimum[PLANE_CURSOR];
+   alloc_size -= minimum[PLANE_CURSOR];
+   ddb->plane[pipe][PLANE_CURSOR].start = alloc->end -
+   minimum[PLANE_CURSOR];
ddb->plane[pipe][PLANE_CURSOR].end = alloc->end;
 
+   for (level = ilk_wm_max_level(dev_priv); level >= 0; level--) {
+   total_level_ddb = 0;
+   for_each_plane_id_on_crtc(intel_crtc, plane_id) {
+   /*
+* TODO: We should calculate watermark values for Y/UV
+* plane both in case of NV12 format and use both values
+* for ddb calculation. NV12 is disabled as of now, So
+* using only single/UV plane value here.
+*/
+   struct skl_plane_wm *wm = &pipe_wm->planes[plane_id];
+   uint16_t plane_res_b = wm->wm[level].plane_res_b;
+   uint16_t min = minimum[plane_id] + y_minimum[plane_id];
+
+   if (plane_id == PLANE_CURSOR)
+   continue;
+
+   total_level_ddb += max(plane_res_b, min);
+   }
+
+   /*
+* If This level can successfully be enabled with the
+* pipe's c

Re: [Intel-gfx] [PATCH] drm/i915: Cancel reset-engine if we couldn't find an active request

2017-05-18 Thread Chris Wilson
On Wed, May 17, 2017 at 06:11:06PM -0700, Michel Thierry wrote:
> On 17/05/17 13:52, Chris Wilson wrote:
> >On Wed, May 17, 2017 at 01:41:34PM -0700, Michel Thierry wrote:
> >>@@ -2827,21 +2829,35 @@ int i915_gem_reset_prepare_engine(struct 
> >>intel_engine_cs *engine)
> >>
> >>if (engine_stalled(engine)) {
> >>request = i915_gem_find_active_request(engine);
> >>-   if (request && request->fence.error == -EIO)
> >>-   err = -EIO; /* Previous reset failed! */
> >>+
> >>+   if (request) {
> >>+   if (request->fence.error == -EIO)
> >>+   return ERR_PTR(-EIO); /* Previous reset failed! 
> >>*/
> >>+
> >>+   if (i915_gem_request_completed(request))
> >>+   return NULL; /* request completed, skip it */
> >
> >This check is pointless here. We are just a few cycles since it was
> >known to be true. Both paths should be doing it just before the actual
> >reset for symmetry.
> 
> As you said, in gem_reset_request, 'guilty' should check for
> i915_gem_request_completed instead of engine_stalled... but at that
> point it's too late to cancel the reset (intel_gpu_reset has already
> been called).

Ok. At that point we are just deciding between skipping the request or
replaying it. The motivation behind carrying forward the active_request
was to avoid the repeated searches + engine_stalled() checks (since any
future check can then just confirm the active_request is still
incomplete).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


<    1   2