[Intel-gfx] [PATCH] drm/i915: Improve HiZ throughput on Cherryview.

2015-01-11 Thread Kenneth Graunke
Found by reading the HIZ_CHICKEN documentation.

Improves performance in a HiZ microbenchmark by around 50%.
Improves performance in OglZBuffer by around 18%.

Thanks to Chris Wilson for helping me figure out where to put this.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 drivers/gpu/drm/i915/i915_reg.h | 3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0f32fd1a..a39bb03 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5202,6 +5202,9 @@ enum punit_power_well {
 #define COMMON_SLICE_CHICKEN2  0x7014
 # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE  (10)
 
+#define HIZ_CHICKEN0x7018
+# define CHV_HZ_8X8_MODE_IN_1X (115)
+
 #define GEN7_L3SQCREG1 0xB010
 #define  VLV_B0_WA_L3SQCREG1_VALUE 0x00D3
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 12a36f0..dabc1d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -836,6 +836,9 @@ static int chv_init_workarounds(struct intel_engine_cs 
*ring)
  HDC_FORCE_NON_COHERENT |
  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
+   /* Improve HiZ throughput on CHV. */
+   WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
+
return 0;
 }
 
-- 
2.2.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Kenneth Graunke
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer.  Certain workloads would run very slowly with
HiZ enabled, but run much faster with the hiz=false driconf option.
With this patch, they run at full speed even with HiZ.

Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
(Iris Pro 6200).

Thanks to Jesse Barnes for finding this missing bit!
Thanks to Chris Wilson for helping me find where to set it.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: Jesse Barnes jbar...@virtuousgeek.org
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++
 1 file changed, 15 insertions(+)

Here's an alternate patch which implements the workaround in the kernel
instead of Mesa.  It's probably better to do it there, since the kernel
does it on Haswell already.

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dabc1d8..23020d6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs 
*ring)
  HDC_DONOT_FETCH_MEM_WHEN_MASKED |
  (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0));
 
+   /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0:
+* The Hierarchical Z RAW Stall Optimization allows non-overlapping
+*  polygons in the same 8x4 pixel/sample area to be processed without
+*  stalling waiting for the earlier ones to write to Hierarchical Z
+*  buffer.
+*
+* This optimization is off by default for Broadwell; turn it on.
+*/
+   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Wa4x4STCOptimizationDisable:bdw */
WA_SET_BIT_MASKED(CACHE_MODE_1,
  GEN8_4x4_STC_OPTIMIZATION_DISABLE);
@@ -836,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs 
*ring)
  HDC_FORCE_NON_COHERENT |
  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
+   /* According to the CACHE_MODE_0 default value documentation, some
+* CHV platforms disable this optimization by default.  Turn it on.
+*/
+   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Improve HiZ throughput on CHV. */
WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
 
-- 
2.2.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] WARNING at drivers/gpu/drm/i915/intel_audio.c:291 ilk_audio_codec_disable

2015-01-11 Thread Daniel Borkmann

I constantly get this warning triggered on bootup. Any ideas? ;)

Thanks a lot,
Daniel

[...]
[4.634728] [drm] Memory usable by graphics device = 2048M
[4.636017] checking generic (9000 42) vs hw (9000 1000)
[4.636020] fb: switching to inteldrmfb from EFI VGA
[4.637030] Console: switching to colour dummy device 80x25
[4.637393] usb 4-1.8.1.1: new full-speed USB device number 6 using ehci-pci
[4.638716] [drm] Replacing VGA console driver
[4.687008] apple 0003:05AC:0249.0002: hidraw1: USB HID v1.11 Device [Apple 
Inc. Apple Internal Keyboard / Trackpad] on usb-:00:1d.0-1.8.2/input1
[4.691830] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[4.691836] [drm] Driver supports precise vblank timestamp query.
[4.692334] vgaarb: device changed decodes: 
PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[4.714479] usb 4-1.8.1.1: New USB device found, idVendor=05ac, 
idProduct=820a
[4.714490] usb 4-1.8.1.1: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[4.716769] input: HID 05ac:820a as 
/devices/pci:00/:00:1d.0/usb4/4-1/4-1.8/4-1.8.1/4-1.8.1.1/4-1.8.1.1:1.0/0003:05AC:820A.0003/input/input5
[4.742378] ACPI: Video Device [IGPU] (multi-head: yes  rom: no  post: no)
[4.743551] acpi device:06: registered as cooling_device4
[4.744023] input: Video Bus as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input6
[4.768459] hid-generic 0003:05AC:820A.0003: input,hidraw2: USB HID v1.11 
Keyboard [HID 05ac:820a] on usb-:00:1d.0-1.8.1.1/input0
[4.768810] [drm] Initialized i915 1.6.0 20141121 for :00:02.0 on minor 0
[4.801685] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit 
banging on pin 2
[4.811196] fbcon: inteldrmfb (fb0) is primary device
[4.811359] [ cut here ]
[4.811387] WARNING: CPU: 0 PID: 96 at 
drivers/gpu/drm/i915/intel_audio.c:291 ilk_audio_codec_disable+0x166/0x1a0 
[i915]()
[4.811388] WARN_ON(!port)
[4.811392] Modules linked in: i915 i2c_algo_bit drm_kms_helper drm i2c_core 
video
[4.811394] CPU: 0 PID: 96 Comm: kworker/u16:6 Not tainted 3.19.0-rc2+ #81
[4.811395] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, 
BIOS MBA51.88Z.00EF.B02.1211271028 11/27/2012
[4.811400] Workqueue: events_unbound async_run_entry_fn
[4.811402]  a0164018 880261baf608 8174e91c 
880265785a00
[4.811404]  880261baf658 880261baf648 810913ca 
0282
[4.811407]  88025f50 0111 000e50c0 

[4.811407] Call Trace:
[4.811412]  [8174e91c] dump_stack+0x4c/0x65
[4.811415]  [810913ca] warn_slowpath_common+0x8a/0xc0
[4.811417]  [81091446] warn_slowpath_fmt+0x46/0x50
[4.811435]  [a00f7606] ilk_audio_codec_disable+0x166/0x1a0 [i915]
[4.811451]  [a00f7ece] intel_audio_codec_disable+0x1e/0x30 [i915]
[4.811472]  [a012f1b5] intel_disable_dp+0x65/0x70 [i915]
[4.811489]  [a010a7ea] ironlake_crtc_disable+0x16a/0x7c0 [i915]
[4.811504]  [a010be19] __intel_set_mode+0xa69/0xc80 [i915]
[4.811520]  [a0112752] intel_crtc_set_config+0x9f2/0xf80 [i915]
[4.811530]  [a00327a2] drm_mode_set_config_internal+0x72/0x120 
[drm]
[4.811535]  [a0092478] restore_fbdev_mode+0xc8/0xf0 
[drm_kms_helper]
[4.811539]  [a0094359] 
drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[4.811542]  [a00943d2] drm_fb_helper_set_par+0x22/0x50 
[drm_kms_helper]
[4.811560]  [a011fe7a] intel_fbdev_set_par+0x1a/0x60 [i915]
[4.811563]  [813e5d14] fbcon_init+0x4f4/0x580
[4.811565]  [81460fcc] visual_init+0xbc/0x120
[4.811567]  [81463693] do_bind_con_driver+0x163/0x330
[4.811569]  [81463e14] do_take_over_console+0x114/0x1c0
[4.811571]  [813e13f3] do_fbcon_takeover+0x63/0xd0
[4.811573]  [813e67bd] fbcon_event_notify+0x68d/0x7e0
[4.811575]  [810b137c] notifier_call_chain+0x4c/0x70
[4.811578]  [810b1653] __blocking_notifier_call_chain+0x53/0x70
[4.811580]  [810b1686] blocking_notifier_call_chain+0x16/0x20
[4.811582]  [813ecc5b] fb_notifier_call_chain+0x1b/0x20
[4.811584]  [813eef3c] register_framebuffer+0x1ec/0x330
[4.811588]  [a009465c] drm_fb_helper_initial_config+0x25c/0x3b0 
[drm_kms_helper]
[4.811604]  [a01211ab] intel_fbdev_initial_config+0x1b/0x20 [i915]
[4.811616]  [810b305a] async_run_entry_fn+0x4a/0x140
[4.811618]  [810aaa72] process_one_work+0x1c2/0x4f0
[4.811620]  [810aa9fb] ? process_one_work+0x14b/0x4f0
[4.811622]  [810aaebb] worker_thread+0x11b/0x4a0
[4.811624]  [810aada0] ? process_one_work+0x4f0/0x4f0
[4.811626]  [810b01ef] 

Re: [Intel-gfx] [PATCH 2/10] drm/i915: Initialize DRRS delayed work

2015-01-11 Thread Chris Wilson
On Sat, Jan 10, 2015 at 02:25:57AM +0530, Vandana Kannan wrote:
 Add DRRS work function to trigger a switch to low refresh rate when activity
 is detected on screen.

Where is this function used? How can I judge that it does the right
thing?

 Signed-off-by: Vandana Kannan vandana.kan...@intel.com
 ---
  drivers/gpu/drm/i915/intel_dp.c | 36 
  1 file changed, 28 insertions(+), 8 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
 index 778dcd0..30b3aa1 100644
 --- a/drivers/gpu/drm/i915/intel_dp.c
 +++ b/drivers/gpu/drm/i915/intel_dp.c
 @@ -4814,20 +4814,38 @@ static void intel_dp_set_drrs_state(struct drm_device 
 *dev, int refresh_rate)
   I915_WRITE(reg, val);
   }
  
 + dev_priv-drrs.refresh_rate_type = index;
 +
 + DRM_DEBUG_KMS(eDP Refresh Rate set to : %dHz\n, refresh_rate);
 +}
 +
 +static void intel_edp_drrs_work(struct work_struct *work)

intel_edp_drrs_downclock_work() would be more self-descriptive

 +{
 + struct drm_i915_private *dev_priv =
 + container_of(work, typeof(*dev_priv), drrs.work.work);
 + struct intel_dp *intel_dp = dev_priv-drrs.dp;
 +
 + mutex_lock(dev_priv-drrs.mutex);
 +
 + if (!intel_dp)
 + goto unlock;

Does dev_priv-drrs.mutex not also protect dev_priv-drrs.dp?

 +
   /*
 -  * mutex taken to ensure that there is no race between differnt
 -  * drrs calls trying to update refresh rate. This scenario may occur
 -  * in future when idleness detection based DRRS in kernel and
 -  * possible calls from user space to set differnt RR are made.
 +  * The delayed work can race with an invalidate hence we need to
 +  * recheck.
*/

This comment no longer applies to all the other callers of
intel_dp_set_drrs_state()? Or did you miss adding the
lockdep_assert_held(dev_priv-drrs.mutex)?

 - mutex_lock(dev_priv-drrs.mutex);
 + if (dev_priv-drrs.busy_frontbuffer_bits)
 + goto unlock;
  
 - dev_priv-drrs.refresh_rate_type = index;
 + if (dev_priv-drrs.refresh_rate_type != DRRS_LOW_RR)
 + intel_dp_set_drrs_state(dev_priv-dev,

Would it not be sensible for intel_dp_set_drrs_state() check for the
no-op itself?

 + intel_dp-attached_connector-panel.
 + downclock_mode-vrefresh);
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 9/10] drm/i915: Add debugfs entry for DRRS

2015-01-11 Thread Chris Wilson
On Sat, Jan 10, 2015 at 02:26:04AM +0530, Vandana Kannan wrote:
 Adding a debugfs entry to determine if DRRS is supported or not
 
 Signed-off-by: Vandana Kannan vandana.kan...@intel.com
 ---
  drivers/gpu/drm/i915/i915_debugfs.c | 18 ++
  1 file changed, 18 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
 b/drivers/gpu/drm/i915/i915_debugfs.c
 index e515aad..544b4c3 100644
 --- a/drivers/gpu/drm/i915/i915_debugfs.c
 +++ b/drivers/gpu/drm/i915/i915_debugfs.c
 @@ -2825,6 +2825,23 @@ static int i915_ddb_info(struct seq_file *m, void 
 *unused)
   return 0;
  }
  
 +static int i915_drrs_status(struct seq_file *m, void *unused)
 +{
 + struct drm_info_node *node = m-private;
 + struct drm_device *dev = node-minor-dev;
 + struct intel_crtc *crtc;
 +
 + for_each_intel_crtc(dev, crtc) {
 + if (crtc-active) {

Don't you want to know which CRTC this is? Would this not be better
extending display_info with the extra CRTC status?

 + if (crtc-config.has_drrs)
 + seq_puts(m, DRRS enabled);
 + else
 + seq_puts(m, DRRS disabled);
 + }
 + }
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] intel_frequency: A tool to manipulate Intel GPU frequency

2015-01-11 Thread Jordan Justen
On 2015-01-10 20:19:29, Ben Widawsky wrote:
 +   printf(Usage: %s [-ef] [--min | --max] [-g (min|max|efficient)][-s 
 frequency_mhz]\n\n, prog);

-ef = -f

Add space before [-s

 +   printf(%s A program to manipulate Intel GPU frequencies.\n\n, prog);
 +   printf(Options: \n);
 +   printf(  -eLock frequency to the most efficient 
 frequency\n);
 +   printf(  -g, --get=Get the frequency (optional arg: 
 \cur\|\min\|\max\|\eff\)\n);
 +   printf(  -s, --set Lock frequency to an absolute value (MHz)\n);
 +   printf(  -c, --custom  Set a min, or max frequency \min=X | 
 max=Y\\n);
 +   printf(  -m  --max Lock frequency to max frequency\n);
 +   printf(  --min Lock frequency to min (never a good idea, 
 DEBUG ONLY)\n);
 +   printf(Examples:\n);
 +   printf(intel_frequency -gmin,cur Get the current and minimum 
 frequency\n);
 +   printf(intel_frequency -s 400Lock frequency to 400Mhz\n);

Add -c example?

Series Reviewed-by: Jordan Justen jordan.l.jus...@intel.com
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Kenneth Graunke
On Sunday, January 11, 2015 05:46:09 PM Ben Widawsky wrote:
 On Sun, Jan 11, 2015 at 04:05:25PM -0800, Kenneth Graunke wrote:
  On Sunday, January 11, 2015 01:49:41 PM Ben Widawsky wrote:
   On Sat, Jan 10, 2015 at 06:44:49PM -0800, Kenneth Graunke wrote:
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer.  Certain workloads would run very slowly with
HiZ enabled, but run much faster with the hiz=false driconf option.
With this patch, they run at full speed even with HiZ.

Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
(Iris Pro 6200).

Thanks to Jesse Barnes for finding this missing bit!
Thanks to Chris Wilson for helping me find where to set it.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: Jesse Barnes jbar...@virtuousgeek.org
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++
 1 file changed, 15 insertions(+)

Here's an alternate patch which implements the workaround in the kernel
instead of Mesa.  It's probably better to do it there, since the kernel
does it on Haswell already.

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dabc1d8..23020d6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct 
intel_engine_cs *ring)
  HDC_DONOT_FETCH_MEM_WHEN_MASKED |
  (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE 
: 0));
 
+   /* From the Haswell PRM, Command Reference: Registers, 
CACHE_MODE_0:
+* The Hierarchical Z RAW Stall Optimization allows 
non-overlapping
+*  polygons in the same 8x4 pixel/sample area to be processed 
without
+*  stalling waiting for the earlier ones to write to 
Hierarchical Z
+*  buffer.
+*
+* This optimization is off by default for Broadwell; turn it 
on.
+*/
+   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Wa4x4STCOptimizationDisable:bdw */
WA_SET_BIT_MASKED(CACHE_MODE_1,
  GEN8_4x4_STC_OPTIMIZATION_DISABLE);
@@ -836,6 +846,11 @@ static int chv_init_workarounds(struct 
intel_engine_cs *ring)
  HDC_FORCE_NON_COHERENT |
  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
+   /* According to the CACHE_MODE_0 default value documentation, 
some
+* CHV platforms disable this optimization by default.  Turn it 
on.
+*/
+   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Improve HiZ throughput on CHV. */
WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
 
   
   I think you should do this as two separate patches, 1 per platform. For 
   the BSW
   patch (given that I had the same functionality in the kernel patch I 
   asked you
   to look at ;-) and FWIW, Jordan has numbers on BSW B-step with my kernel 
   patch
   which we can use for the commit):
   Signed-off-by: Ben Widawsky b...@bwidawsk.net
  
  Huh, I don't recall seeing that kernel patch.  Sorry.  I guess I'll split it
  and resubmit...
  
 
 It's not my call, it's just nice to have platform specific bisection. And the
 patch wasn't on the list, it was the one I kept asking you to look at in my
 branch :-)
 
   I haven't looked at Broadwell docs, so I'll let someone else take care of 
   that.
   
   I don't know if I agree with Chris that we should call these in the 
   workaround
   section, but whatever. init_clock_gating is equally sucky.
  
  init_clock_gating doesn't work.  The register writes don't stick and they 
  have
  no effect at all.  Setting them here makes them actually take effect in the
  context.
  
  --Ken
 
 Separate thread now, but are you sure? We're setting at least two context
 specific registers in there today, among them: GEN7_FF_THREAD_MODE (which is
 important to performance).

It looks like we're setting:

- [BDW] GAM_ECOCHK - 0x4090
- [BDW] CHICKEN_PAR1_1 - 0x42080
- [BDW] RC_SLEEP_PSMI_CONTROL - 0x2050
- [BDW, CHV] UCGCTL6 - 0x9430
- [BDW, CHV] FF_THREAD_MODE - 0x20a0
- [CHV] DSPCLK_GATE_D - display
- [CHV] MI_ARB_VLV - display
- [CHV] RC_SLEEP_PSMI_CONTROL - 0x12050
- [CHV] UCGCTL1 - 0x9400

I searched for all of these in the Register State Context tables for BDW
and CHV, and I didn't see any of them listed (including FF_THREAD_MODE or
0x20a0).  So I'm pretty sure these are not part of the context, and so they
should work.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org

[Intel-gfx] drm stuck on render ring starting xlock on a x201 using SNA

2015-01-11 Thread Charles Devereaux
Hello

Aftertrying to leave an xlock started with  xlock -nolock +use3d -mode
gears, I get a drm stuck on render ring error.

The funny thing is this problem did not happen on the exact same thinkpad
x201 laptop using an identical xorg.conf but with a debian jessie i386  (I
did a clean reinstall of debian jessie amd64)

xorg.conf contains:
Section Device
Identifier  i950
Driver  intel
BusID   PCI:0:2:0
Screen 0
OptionAccelMethodSNA
Option  FallbackDebug true
EndSection

I'm using kernel  3.14.25, and I saved /sys/class/drm/card0/error and
gpucrash in case it's not a know bug.

Any suggestion?

[0.396228] [drm] Initialized drm 1.1.0 20060810
[0.396870] [drm] Memory usable by graphics device = 2048M
[0.405464] i915 :00:02.0: irq 43 for MSI/MSI-X
[0.405480] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[0.405483] [drm] Driver supports precise vblank timestamp query.
[0.405586] vgaarb: device changed decodes:
PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[0.408539] ACPI: Battery Slot [BAT0] (battery present)
[0.460085] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
[0.471385] fbcon: inteldrmfb (fb0) is primary device
(...)
[1.077765] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on
minor 0
(...)
[  207.574130] [drm] stuck on render ring
[  207.574149] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  207.574152] [drm] GPU hangs can indicate a bug anywhere in the entire
gfx stack, including userspace.
[  207.574154] [drm] Please file a _new_ bug report on bugs.freedesktop.org
against DRI - DRM/Intel
[  207.574155] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[  207.574157] [drm] The gpu crash dump is required to analyze gpu hangs,
so please always attach it.
[  207.581041] [drm:init_ring_common] *ERROR* failed to set render ring
head to zero ctl  head 008014dc tail  start 3000
[  207.632121] [drm:init_ring_common] *ERROR* render ring initialization
failed ctl 0001f001 head 008014dc tail  start 3000
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Kenneth Graunke
On Sunday, January 11, 2015 05:46:09 PM Ben Widawsky wrote:
 On Sun, Jan 11, 2015 at 04:05:25PM -0800, Kenneth Graunke wrote:
  On Sunday, January 11, 2015 01:49:41 PM Ben Widawsky wrote:
   On Sat, Jan 10, 2015 at 06:44:49PM -0800, Kenneth Graunke wrote:
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer.  Certain workloads would run very slowly with
HiZ enabled, but run much faster with the hiz=false driconf option.
With this patch, they run at full speed even with HiZ.

Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
(Iris Pro 6200).

Thanks to Jesse Barnes for finding this missing bit!
Thanks to Chris Wilson for helping me find where to set it.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: Jesse Barnes jbar...@virtuousgeek.org
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++
 1 file changed, 15 insertions(+)

Here's an alternate patch which implements the workaround in the kernel
instead of Mesa.  It's probably better to do it there, since the kernel
does it on Haswell already.

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dabc1d8..23020d6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct 
intel_engine_cs *ring)
  HDC_DONOT_FETCH_MEM_WHEN_MASKED |
  (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE 
: 0));
 
+   /* From the Haswell PRM, Command Reference: Registers, 
CACHE_MODE_0:
+* The Hierarchical Z RAW Stall Optimization allows 
non-overlapping
+*  polygons in the same 8x4 pixel/sample area to be processed 
without
+*  stalling waiting for the earlier ones to write to 
Hierarchical Z
+*  buffer.
+*
+* This optimization is off by default for Broadwell; turn it 
on.
+*/
+   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Wa4x4STCOptimizationDisable:bdw */
WA_SET_BIT_MASKED(CACHE_MODE_1,
  GEN8_4x4_STC_OPTIMIZATION_DISABLE);
@@ -836,6 +846,11 @@ static int chv_init_workarounds(struct 
intel_engine_cs *ring)
  HDC_FORCE_NON_COHERENT |
  HDC_DONOT_FETCH_MEM_WHEN_MASKED);
 
+   /* According to the CACHE_MODE_0 default value documentation, 
some
+* CHV platforms disable this optimization by default.  Turn it 
on.
+*/
+   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
+
/* Improve HiZ throughput on CHV. */
WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
 
   
   I think you should do this as two separate patches, 1 per platform. For 
   the BSW
   patch (given that I had the same functionality in the kernel patch I 
   asked you
   to look at ;-) and FWIW, Jordan has numbers on BSW B-step with my kernel 
   patch
   which we can use for the commit):
   Signed-off-by: Ben Widawsky b...@bwidawsk.net
  
  Huh, I don't recall seeing that kernel patch.  Sorry.  I guess I'll split it
  and resubmit...
  
 
 It's not my call, it's just nice to have platform specific bisection. And the
 patch wasn't on the list, it was the one I kept asking you to look at in my
 branch :-)

   I haven't looked at Broadwell docs, so I'll let someone else take care of 
   that.
   
   I don't know if I agree with Chris that we should call these in the 
   workaround
   section, but whatever. init_clock_gating is equally sucky.
  
  init_clock_gating doesn't work.  The register writes don't stick and they 
  have
  no effect at all.  Setting them here makes them actually take effect in the
  context.
  
  --Ken
 
 Separate thread now, but are you sure? We're setting at least two context
 specific registers in there today, among them: GEN7_FF_THREAD_MODE (which is
 important to performance).
 
 AFAIK it should stick, and if it doesn't it's not expected behavior. Unless 
 you
 know something I do not?

Jesse had suggested setting it in broadwell_init_clock_gating on January 5th,
and Valtteri tried it on January 7th.  He found no noticeable difference.
I tried it again, and confirmed his result: there was zero performance impact.

Setting it via an LRI in Mesa did have a performance impact.  I reverted my
Mesa patch, and tried setting it here, and it had the same performance impact.
I rebooted between kernels several times to confirm.  It works here, but it
doesn't there.

I'm pretty sure I confirmed the same result with this bit.  Feel free to try.

Perhaps we should move the rest of the per-context bits here instead of
*_init_clock_gating.  We 

Re: [Intel-gfx] [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Ben Widawsky
On Sun, Jan 11, 2015 at 07:05:21PM -0800, Kenneth Graunke wrote:
 On Sunday, January 11, 2015 05:46:09 PM Ben Widawsky wrote:
  On Sun, Jan 11, 2015 at 04:05:25PM -0800, Kenneth Graunke wrote:
   On Sunday, January 11, 2015 01:49:41 PM Ben Widawsky wrote:
On Sat, Jan 10, 2015 at 06:44:49PM -0800, Kenneth Graunke wrote:
 This is an important optimization for avoiding read-after-write (RAW)
 stalls in the HiZ buffer.  Certain workloads would run very slowly 
 with
 HiZ enabled, but run much faster with the hiz=false driconf option.
 With this patch, they run at full speed even with HiZ.
 
 Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
 (Iris Pro 6200).
 
 Thanks to Jesse Barnes for finding this missing bit!
 Thanks to Chris Wilson for helping me find where to set it.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: Jesse Barnes jbar...@virtuousgeek.org
 ---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++
  1 file changed, 15 insertions(+)
 
 Here's an alternate patch which implements the workaround in the 
 kernel
 instead of Mesa.  It's probably better to do it there, since the 
 kernel
 does it on Haswell already.
 
 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
 index dabc1d8..23020d6 100644
 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
 +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
 @@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct 
 intel_engine_cs *ring)
 HDC_DONOT_FETCH_MEM_WHEN_MASKED |
 (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE 
 : 0));
  
 + /* From the Haswell PRM, Command Reference: Registers, 
 CACHE_MODE_0:
 +  * The Hierarchical Z RAW Stall Optimization allows 
 non-overlapping
 +  *  polygons in the same 8x4 pixel/sample area to be processed 
 without
 +  *  stalling waiting for the earlier ones to write to 
 Hierarchical Z
 +  *  buffer.
 +  *
 +  * This optimization is off by default for Broadwell; turn it 
 on.
 +  */
 + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
 +
   /* Wa4x4STCOptimizationDisable:bdw */
   WA_SET_BIT_MASKED(CACHE_MODE_1,
 GEN8_4x4_STC_OPTIMIZATION_DISABLE);
 @@ -836,6 +846,11 @@ static int chv_init_workarounds(struct 
 intel_engine_cs *ring)
 HDC_FORCE_NON_COHERENT |
 HDC_DONOT_FETCH_MEM_WHEN_MASKED);
  
 + /* According to the CACHE_MODE_0 default value documentation, 
 some
 +  * CHV platforms disable this optimization by default.  Turn it 
 on.
 +  */
 + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
 +
   /* Improve HiZ throughput on CHV. */
   WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
  

I think you should do this as two separate patches, 1 per platform. For 
the BSW
patch (given that I had the same functionality in the kernel patch I 
asked you
to look at ;-) and FWIW, Jordan has numbers on BSW B-step with my 
kernel patch
which we can use for the commit):
Signed-off-by: Ben Widawsky b...@bwidawsk.net
   
   Huh, I don't recall seeing that kernel patch.  Sorry.  I guess I'll split 
   it
   and resubmit...
   
  
  It's not my call, it's just nice to have platform specific bisection. And 
  the
  patch wasn't on the list, it was the one I kept asking you to look at in my
  branch :-)
  
I haven't looked at Broadwell docs, so I'll let someone else take care 
of that.

I don't know if I agree with Chris that we should call these in the 
workaround
section, but whatever. init_clock_gating is equally sucky.
   
   init_clock_gating doesn't work.  The register writes don't stick and they 
   have
   no effect at all.  Setting them here makes them actually take effect in 
   the
   context.
   
   --Ken
  
  Separate thread now, but are you sure? We're setting at least two context
  specific registers in there today, among them: GEN7_FF_THREAD_MODE (which is
  important to performance).
 
 It looks like we're setting:
 
 - [BDW] GAM_ECOCHK - 0x4090

ECO registers are never ctx, I think

 - [BDW] CHICKEN_PAR1_1 - 0x42080

Diplay registers are never

 - [BDW] RC_SLEEP_PSMI_CONTROL - 0x2050

dword offset 0x1c in the context image

 - [BDW, CHV] UCGCTL6 - 0x9430
 - [BDW, CHV] FF_THREAD_MODE - 0x20a0

dword offset 0x2a in the context image

 - [CHV] DSPCLK_GATE_D - display
 - [CHV] MI_ARB_VLV - display

More display...

 - [CHV] RC_SLEEP_PSMI_CONTROL - 0x12050

Kinda surprised this one isn't there. I'm not sure how it can work correctly.

 - [CHV] UCGCTL1 - 0x9400
 
 I searched for all of these in 

[Intel-gfx] [PATCH] drm/i915: Cleaning up encoder in the end on intel_dp_encoder_destroy

2015-01-11 Thread Sonika Jindal
We clean up the encoder and then try to acquire pps_lock where it tries
to get the encoder. So moving the cleanup to the end.

Suggested-by: Satheeshakrishna M satheeshakrishn...@intel.com
Signed-off-by: Sonika Jindal sonika.jin...@intel.com
---
 drivers/gpu/drm/i915/intel_dp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 64c7578..92415f4 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -4317,8 +4317,6 @@ void intel_dp_encoder_destroy(struct drm_encoder *encoder)
struct intel_dp *intel_dp = intel_dig_port-dp;
 
drm_dp_aux_unregister(intel_dp-aux);
-   intel_dp_mst_encoder_cleanup(intel_dig_port);
-   drm_encoder_cleanup(encoder);
if (is_edp(intel_dp)) {
cancel_delayed_work_sync(intel_dp-panel_vdd_work);
/*
@@ -4334,6 +4332,8 @@ void intel_dp_encoder_destroy(struct drm_encoder *encoder)
intel_dp-edp_notifier.notifier_call = NULL;
}
}
+   intel_dp_mst_encoder_cleanup(intel_dig_port);
+   drm_encoder_cleanup(encoder);
kfree(intel_dig_port);
 }
 
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Ben Widawsky
On Sat, Jan 10, 2015 at 06:44:49PM -0800, Kenneth Graunke wrote:
 This is an important optimization for avoiding read-after-write (RAW)
 stalls in the HiZ buffer.  Certain workloads would run very slowly with
 HiZ enabled, but run much faster with the hiz=false driconf option.
 With this patch, they run at full speed even with HiZ.
 
 Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
 (Iris Pro 6200).
 
 Thanks to Jesse Barnes for finding this missing bit!
 Thanks to Chris Wilson for helping me find where to set it.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: Jesse Barnes jbar...@virtuousgeek.org
 ---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++
  1 file changed, 15 insertions(+)
 
 Here's an alternate patch which implements the workaround in the kernel
 instead of Mesa.  It's probably better to do it there, since the kernel
 does it on Haswell already.
 
 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
 index dabc1d8..23020d6 100644
 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
 +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
 @@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs 
 *ring)
 HDC_DONOT_FETCH_MEM_WHEN_MASKED |
 (IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0));
  
 + /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0:
 +  * The Hierarchical Z RAW Stall Optimization allows non-overlapping
 +  *  polygons in the same 8x4 pixel/sample area to be processed without
 +  *  stalling waiting for the earlier ones to write to Hierarchical Z
 +  *  buffer.
 +  *
 +  * This optimization is off by default for Broadwell; turn it on.
 +  */
 + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
 +
   /* Wa4x4STCOptimizationDisable:bdw */
   WA_SET_BIT_MASKED(CACHE_MODE_1,
 GEN8_4x4_STC_OPTIMIZATION_DISABLE);
 @@ -836,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs 
 *ring)
 HDC_FORCE_NON_COHERENT |
 HDC_DONOT_FETCH_MEM_WHEN_MASKED);
  
 + /* According to the CACHE_MODE_0 default value documentation, some
 +  * CHV platforms disable this optimization by default.  Turn it on.
 +  */
 + WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
 +
   /* Improve HiZ throughput on CHV. */
   WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
  

I think you should do this as two separate patches, 1 per platform. For the BSW
patch (given that I had the same functionality in the kernel patch I asked you
to look at ;-) and FWIW, Jordan has numbers on BSW B-step with my kernel patch
which we can use for the commit):
Signed-off-by: Ben Widawsky b...@bwidawsk.net

I haven't looked at Broadwell docs, so I'll let someone else take care of that.

I don't know if I agree with Chris that we should call these in the workaround
section, but whatever. init_clock_gating is equally sucky.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] [v2] intel_frequency: A tool to manipulate Intel GPU frequency

2015-01-11 Thread Ben Widawsky
On Sun, Jan 11, 2015 at 07:35:21PM -0800, Ben Widawsky wrote:
 WARNING: very minimally tested
 
 In general you should not need this tool. It's primary purpose is for
 benchmarking, and for debugging performance issues.

I noticed the it's vs its on v1, but forgot to fix it. IT'S fixed locally
though.

 
 For many kernel releases now sysfs has supported reading and writing the GPU
 frequency. Therefore, this tool provides no new functionality. What it does
 provide is an easy to package (for distros) tool that handles the most common
 scenarios.
 
 v2:
 Get rid of -f from the usage message (Jordan)
 Add space before [-s (Jordan)
 Add a -c/--custom example (Jordan)
 Add a setting for resetting to hardware default (Ken)
 Replicate examples in commit message in the source code. (me)
 
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com
 Cc: Kenneth Graunke kenn...@whitecape.org
 
 Here are some sample usages:
 $ sudo intel_frequency --get=cur,min,max,eff
 cur: 200 MHz
 min: 200 MHz
 RP1: 200 MHz
 max: 1200 MHz
 
 $ sudo intel_frequency -g
 cur: 200 MHz
 min: 200 MHz
 RP1: 200 MHz
 max: 1200 MHz
 
 $ sudo intel_frequency -geff
 RP1: 200 MHz
 
 $ sudo intel_frequency --set min=300
 $ sudo intel_frequency --get min
 cur: 300 MHz
 min: 300 MHz
 RP1: 200 MHz
 max: 1200 MHz
 
 $ sudo intel_frequency --custom max=900
 $ sudo intel_frequency --get max
 cur: 300 MHz
 min: 300 MHz
 RP1: 200 MHz
 max: 900 MHz
 
 $ sudo intel_frequency --max
 $ sudo intel_frequency -g
 cur: 1200 MHz
 min: 1200 MHz
 RP1: 200 MHz
 max: 1200 MHz
 
 $ sudo intel_frequency -e
 $ sudo intel_frequency -g
 cur: 200 MHz
 min: 200 MHz
 RP1: 200 MHz
 max: 200 MHz
 
 $ sudo intel_frequency --max
 $ sudo intel_frequency -g
 cur: 1200 MHz
 min: 1200 MHz
 RP1: 200 MHz
 max: 1200 MHz
 
 $ sudo intel_frequency --min
 $ sudo intel_frequency -g
 cur: 200 MHz
 min: 200 MHz
 RP1: 200 MHz
 max: 200 MHz
 ---
  tools/Makefile.sources  |   1 +
  tools/intel_frequency.c | 363 
 
  2 files changed, 364 insertions(+)
  create mode 100644 tools/intel_frequency.c
 
 diff --git a/tools/Makefile.sources b/tools/Makefile.sources
 index b85a6b8..2bea389 100644
 --- a/tools/Makefile.sources
 +++ b/tools/Makefile.sources
 @@ -14,6 +14,7 @@ bin_PROGRAMS =  \
   intel_dump_decode   \
   intel_error_decode  \
   intel_forcewaked\
 + intel_frequency \
   intel_framebuffer_dump  \
   intel_gpu_time  \
   intel_gpu_top   \
 diff --git a/tools/intel_frequency.c b/tools/intel_frequency.c
 new file mode 100644
 index 000..59f3814
 --- /dev/null
 +++ b/tools/intel_frequency.c
 @@ -0,0 +1,363 @@
 +/*
 + * Copyright © 2015 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 + * DEALINGS IN THE SOFTWARE.
 + *
 + * Example:
 + * Get all frequencies:
 + * intel_frequency --get=cur,min,max,eff
 + *
 + * Same as above:
 + * intel_frequency -g
 + *
 + * Get the efficient frequency:
 + * intel_frequency -geff
 + *
 + * Lock the GPU frequency to 300MHz:
 + * intel_frequency --set min=300
 + *
 + * Set the maximum frequency to 900MHz:
 + * intel_frequency --custom max=900
 + *
 + * Lock the GPU frequency to its maximum frequency:
 + * intel_frequency --max
 + *
 + * Lock the GPU frequency to its most efficient frequency:
 + * intel_frequency -e
 + *
 + * Lock The GPU frequency to its minimum frequency:
 + * intel_frequency --min
 + *
 + * Reset the GPU to hardware defaults
 + * intel_frequency -d
 + */
 +
 +#define _GNU_SOURCE
 +#include assert.h
 +#include getopt.h
 +#include stdio.h
 +#include time.h
 +#include unistd.h
 +
 +#include drmtest.h
 +#include intel_chipset.h
 +
 +static int device, devid;
 +
 +enum {
 + CUR=0,
 + MIN,
 + EFF,
 + MAX,
 + RP0,
 + RPn

Re: [Intel-gfx] [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Ben Widawsky
On Sun, Jan 11, 2015 at 06:53:32PM -0800, Kenneth Graunke wrote:

[snip]

 
 Jesse had suggested setting it in broadwell_init_clock_gating on January 5th,
 and Valtteri tried it on January 7th.  He found no noticeable difference.
 I tried it again, and confirmed his result: there was zero performance impact.
 
 Setting it via an LRI in Mesa did have a performance impact.  I reverted my
 Mesa patch, and tried setting it here, and it had the same performance impact.
 I rebooted between kernels several times to confirm.  It works here, but it
 doesn't there.
 
 I'm pretty sure I confirmed the same result with this bit.  Feel free to try.
 

That's okay. I believe you, I just thought you may have known something I 
didn't.

 Perhaps we should move the rest of the per-context bits here instead of
 *_init_clock_gating.  We should also confirm that the other bits are actually
 having an effect.

If this is the behavior we're getting, we should absolutely do this.

 
 I don't know why it works on Haswell, but it does there - the HiZ RAW stall
 bit is set via haswell_init_clock_gating, and it's clearly having an impact.
 Maybe it has something to do with the golden context, which is new on BDW.
 But I'm probably wrong about that.  Setting it when a context is active does
 seem more reliable...

The golden context stuff has been backported to all gens supported hardware
contexts and/or execlists, so I don't think it's that.

The big difference between the workarounds and init clock gating is the former
is done via LRI, and the latter MMIO. (also, the latter is run again on resume,
and I don't think the former is).

Thinking outloud - what's the default setting for execlists on BDW now?  For
execlists my plan (when it was my plan to have) had always been to manually set
the register in the context image before loading it. We don't do that with the
existing code, we use the old ringbuffer style of, hope it preserves the
contents. I wonder if that's the distinction between HSW.

 
 --Ken


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [Mesa-dev] [PATCH] drm/i915: Enable the HiZ RAW Stall Optimization on Gen8.

2015-01-11 Thread Kenneth Graunke
On Sunday, January 11, 2015 01:49:41 PM Ben Widawsky wrote:
 On Sat, Jan 10, 2015 at 06:44:49PM -0800, Kenneth Graunke wrote:
  This is an important optimization for avoiding read-after-write (RAW)
  stalls in the HiZ buffer.  Certain workloads would run very slowly with
  HiZ enabled, but run much faster with the hiz=false driconf option.
  With this patch, they run at full speed even with HiZ.
  
  Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
  (Iris Pro 6200).
  
  Thanks to Jesse Barnes for finding this missing bit!
  Thanks to Chris Wilson for helping me find where to set it.
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  Cc: Jesse Barnes jbar...@virtuousgeek.org
  ---
   drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++
   1 file changed, 15 insertions(+)
  
  Here's an alternate patch which implements the workaround in the kernel
  instead of Mesa.  It's probably better to do it there, since the kernel
  does it on Haswell already.
  
  diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
  b/drivers/gpu/drm/i915/intel_ringbuffer.c
  index dabc1d8..23020d6 100644
  --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
  +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
  @@ -796,6 +796,16 @@ static int bdw_init_workarounds(struct intel_engine_cs 
  *ring)
HDC_DONOT_FETCH_MEM_WHEN_MASKED |
(IS_BDW_GT3(dev) ? HDC_FENCE_DEST_SLM_DISABLE : 0));
   
  +   /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0:
  +* The Hierarchical Z RAW Stall Optimization allows non-overlapping
  +*  polygons in the same 8x4 pixel/sample area to be processed without
  +*  stalling waiting for the earlier ones to write to Hierarchical Z
  +*  buffer.
  +*
  +* This optimization is off by default for Broadwell; turn it on.
  +*/
  +   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
  +
  /* Wa4x4STCOptimizationDisable:bdw */
  WA_SET_BIT_MASKED(CACHE_MODE_1,
GEN8_4x4_STC_OPTIMIZATION_DISABLE);
  @@ -836,6 +846,11 @@ static int chv_init_workarounds(struct intel_engine_cs 
  *ring)
HDC_FORCE_NON_COHERENT |
HDC_DONOT_FETCH_MEM_WHEN_MASKED);
   
  +   /* According to the CACHE_MODE_0 default value documentation, some
  +* CHV platforms disable this optimization by default.  Turn it on.
  +*/
  +   WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
  +
  /* Improve HiZ throughput on CHV. */
  WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);
   
 
 I think you should do this as two separate patches, 1 per platform. For the 
 BSW
 patch (given that I had the same functionality in the kernel patch I asked you
 to look at ;-) and FWIW, Jordan has numbers on BSW B-step with my kernel patch
 which we can use for the commit):
 Signed-off-by: Ben Widawsky b...@bwidawsk.net

Huh, I don't recall seeing that kernel patch.  Sorry.  I guess I'll split it
and resubmit...

 I haven't looked at Broadwell docs, so I'll let someone else take care of 
 that.
 
 I don't know if I agree with Chris that we should call these in the workaround
 section, but whatever. init_clock_gating is equally sucky.

init_clock_gating doesn't work.  The register writes don't stick and they have
no effect at all.  Setting them here makes them actually take effect in the
context.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] intel_audio_dump: add support for Cherryview

2015-01-11 Thread Yang, Libin
From ebfde852d9efbd7213c391e91be9d0741813bb16 Mon Sep 17 00:00:00 2001
From: Libin Yang libin.y...@intel.com
Date: Wed, 7 Jan 2015 10:56:18 +0800
Subject: [PATCH] intel_audio_dump: add support for Cherryview

This patch adds support for dumping audio registers of Cherryview.

Signed-off-by: Libin Yang libin.y...@intel.com
---
 lib/intel_reg.h  |  2 ++
 tools/intel_audio_dump.c | 93 ++--
 2 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/lib/intel_reg.h b/lib/intel_reg.h
index fcc9d7c..ade1c0c 100644
--- a/lib/intel_reg.h
+++ b/lib/intel_reg.h
@@ -1274,6 +1274,8 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #define SDVO_PIPE_B_SELECT (1  30)
 #define SDVO_STALL_SELECT  (1  29)
 #define SDVO_INTERRUPT_ENABLE  (1  26)
+
+#define DISPLAY_HOTPLUG_CTL 0x61164
 /*
  * 915G/GM SDVO pixel multiplier.
  *
diff --git a/tools/intel_audio_dump.c b/tools/intel_audio_dump.c
index f3bb9eb..b673288 100644
--- a/tools/intel_audio_dump.c
+++ b/tools/intel_audio_dump.c
@@ -69,19 +69,19 @@ static int disp_reg_base = 0;   /* base address of 
display registers */
 #define dump_reg(reg, desc)\
do {\
dword = INREG(reg); \
-   printf(%-21s 0x%08x  %s\n, # reg, dword, desc);   \
+   printf(%-21s(%#x) 0x%08x  %s\n, # reg, reg, dword, desc); 
\
} while (0)
 
 #define dump_disp_reg(reg, desc)   \
do {\
dword = INREG(disp_reg_base + reg); \
-   printf(%-21s 0x%08x  %s\n, # reg, dword, desc);   \
+   printf(%-21s(%#x) 0x%08x  %s\n, # reg, reg, dword, desc); 
\
} while (0)
 
 #define dump_aud_reg(reg, desc)\
do {\
dword = INREG(aud_reg_base + reg);  \
-   printf(%-21s 0x%08x  %s\n, # reg, dword, desc);   \
+   printf(%-21s(%#x) 0x%08x  %s\n, # reg, reg, dword, desc); 
\
} while (0)
 
 #define read_aud_reg(reg)  INREG(aud_reg_base + (reg))
@@ -1771,6 +1771,9 @@ static void dump_aud_hdmi_status(void)
 #define HDMI_CTL_B 0x1140
 #define HDMI_CTL_C 0x1150
 #define HDMI_CTL_D 0x1160
+#define BSW_HDMI_CTL_B 0x1140
+#define BSW_HDMI_CTL_C 0x1160
+#define BSW_HDMI_CTL_D 0x116c
 
 /* VLV HDMI port ctrl */
 #define SDVO_HDMI_CTL_B0x1140
@@ -2108,6 +2111,10 @@ static void dump_hsw_plus(void)
 
set_aud_reg_base(0x65000);
 
+   dump_reg(PORT_HOTPLUG_EN, port hotplug enable);
+   dump_reg(PORT_HOTPLUG_STAT, port hotplug status);
+   dump_reg(DISPLAY_HOTPLUG_CTL, display hotplug control);
+
/* HSW DDI Buffer */
dump_reg(DDI_BUF_CTL_A,DDI Buffer Controler A);
dump_reg(DDI_BUF_CTL_B,DDI Buffer Controler B);
@@ -2267,6 +2274,83 @@ static void dump_hsw_plus(void)
printf(\n);
 }
 
+/* offset of hotplug enable */
+#define PORT_HOTPLUG_EN_OFFSET 0x1110
+/* offset of hotplug status */
+#define PORT_HOTPLUG_STAT_OFFSET 0x1114
+/* offset of hotplug control*/
+#define DISPLAY_HOTPLUG_CTL_OFFSET 0x1164
+/* dump the braswell registers for audio */
+static void dump_braswell(void)
+{
+   uint32_t dword;
+
+   /* set_aud_reg_base(0x62000 + VLV_DISPLAY_BASE); */
+   set_reg_base(0x6 + VLV_DISPLAY_BASE, 0x2000);
+
+
+   dump_disp_reg(PORT_HOTPLUG_EN_OFFSET, port hotplug enable);
+   dump_disp_reg(PORT_HOTPLUG_STAT_OFFSET, port hotplug status);
+   dump_disp_reg(DISPLAY_HOTPLUG_CTL_OFFSET, display hotplug control);
+
+   dump_disp_reg(BSW_HDMI_CTL_B,   sDVO/HDMI Port B Control);
+   dump_disp_reg(BSW_HDMI_CTL_C,   HDMI Port C Control); // The 
address is wrong?
+   dump_disp_reg(BSW_HDMI_CTL_D,   HDMI Port D Control);
+
+   dump_disp_reg(DP_CTL_B, DisplayPort B Control 
Register);
+   dump_disp_reg(DP_CTL_C, DisplayPort C Control 
Register);
+   dump_disp_reg(DP_CTL_D, DisplayPort D Control Register);
+
+   /* HSW North Display Audio */
+   dump_aud_reg(AUD_TCA_CONFIG,   Audio Configuration - 
Transcoder A);
+   dump_aud_reg(AUD_TCB_CONFIG,   Audio Configuration - 
Transcoder B);
+   dump_aud_reg(AUD_TCC_CONFIG,   Audio Configuration - 
Transcoder C);
+   dump_aud_reg(AUD_C1_MISC_CTRL, Audio Converter 1 MISC 
Control);
+   dump_aud_reg(AUD_C2_MISC_CTRL, Audio Converter 2 MISC 
Control);
+   dump_aud_reg(AUD_C3_MISC_CTRL, Audio Converter 3 MISC 
Control);
+   dump_aud_reg(AUD_VID_DID,  Audio Vendor ID / Device ID);
+   

Re: [Intel-gfx] [PATCH] drm/i915: Improve HiZ throughput on Cherryview.

2015-01-11 Thread shuang . he
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
PNV  354/354  354/354
ILK  201/201  201/201
SNB  +2-2  401/424  401/424
IVB  488/488  488/488
BYT  278/278  278/278
HSW -1  529/529  528/529
BDW -1  405/405  404/405
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*SNB  igt_kms_flip_event_leak  NSPT(3, M35)  PASS(1, M35)
 SNB  igt_kms_flip_flip-vs-dpms-off-vs-modeset-interruptible  NSPT(1, 
M35)PASS(2, M35)  PASS(1, M35)
*SNB  igt_kms_flip_tiling_flip-changes-tiling  PASS(2, M35)  TIMEOUT(1, 
M35)
*SNB  igt_gem_concurrent_blit_gtt-rcs-early-read-interruptible  PASS(3, 
M35)  DMESG_WARN(1, M35)
 HSW  igt_kms_flip_dpms-vs-vblank-race  DMESG_WARN(2, M40)PASS(1, M20)  
DMESG_WARN(1, M40)
*BDW  igt_gem_concurrent_blit_gtt-bcs-gpu-read-after-write-interruptible  
PASS(2, M30)  DMESG_WARN(1, M30)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] [v2] intel_frequency: A tool to manipulate Intel GPU frequency

2015-01-11 Thread Ben Widawsky
WARNING: very minimally tested

In general you should not need this tool. It's primary purpose is for
benchmarking, and for debugging performance issues.

For many kernel releases now sysfs has supported reading and writing the GPU
frequency. Therefore, this tool provides no new functionality. What it does
provide is an easy to package (for distros) tool that handles the most common
scenarios.

v2:
Get rid of -f from the usage message (Jordan)
Add space before [-s (Jordan)
Add a -c/--custom example (Jordan)
Add a setting for resetting to hardware default (Ken)
Replicate examples in commit message in the source code. (me)

Signed-off-by: Ben Widawsky b...@bwidawsk.net
Reviewed-by: Jordan Justen jordan.l.jus...@intel.com
Cc: Kenneth Graunke kenn...@whitecape.org

Here are some sample usages:
$ sudo intel_frequency --get=cur,min,max,eff
cur: 200 MHz
min: 200 MHz
RP1: 200 MHz
max: 1200 MHz

$ sudo intel_frequency -g
cur: 200 MHz
min: 200 MHz
RP1: 200 MHz
max: 1200 MHz

$ sudo intel_frequency -geff
RP1: 200 MHz

$ sudo intel_frequency --set min=300
$ sudo intel_frequency --get min
cur: 300 MHz
min: 300 MHz
RP1: 200 MHz
max: 1200 MHz

$ sudo intel_frequency --custom max=900
$ sudo intel_frequency --get max
cur: 300 MHz
min: 300 MHz
RP1: 200 MHz
max: 900 MHz

$ sudo intel_frequency --max
$ sudo intel_frequency -g
cur: 1200 MHz
min: 1200 MHz
RP1: 200 MHz
max: 1200 MHz

$ sudo intel_frequency -e
$ sudo intel_frequency -g
cur: 200 MHz
min: 200 MHz
RP1: 200 MHz
max: 200 MHz

$ sudo intel_frequency --max
$ sudo intel_frequency -g
cur: 1200 MHz
min: 1200 MHz
RP1: 200 MHz
max: 1200 MHz

$ sudo intel_frequency --min
$ sudo intel_frequency -g
cur: 200 MHz
min: 200 MHz
RP1: 200 MHz
max: 200 MHz
---
 tools/Makefile.sources  |   1 +
 tools/intel_frequency.c | 363 
 2 files changed, 364 insertions(+)
 create mode 100644 tools/intel_frequency.c

diff --git a/tools/Makefile.sources b/tools/Makefile.sources
index b85a6b8..2bea389 100644
--- a/tools/Makefile.sources
+++ b/tools/Makefile.sources
@@ -14,6 +14,7 @@ bin_PROGRAMS =\
intel_dump_decode   \
intel_error_decode  \
intel_forcewaked\
+   intel_frequency \
intel_framebuffer_dump  \
intel_gpu_time  \
intel_gpu_top   \
diff --git a/tools/intel_frequency.c b/tools/intel_frequency.c
new file mode 100644
index 000..59f3814
--- /dev/null
+++ b/tools/intel_frequency.c
@@ -0,0 +1,363 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Example:
+ * Get all frequencies:
+ * intel_frequency --get=cur,min,max,eff
+ *
+ * Same as above:
+ * intel_frequency -g
+ *
+ * Get the efficient frequency:
+ * intel_frequency -geff
+ *
+ * Lock the GPU frequency to 300MHz:
+ * intel_frequency --set min=300
+ *
+ * Set the maximum frequency to 900MHz:
+ * intel_frequency --custom max=900
+ *
+ * Lock the GPU frequency to its maximum frequency:
+ * intel_frequency --max
+ *
+ * Lock the GPU frequency to its most efficient frequency:
+ * intel_frequency -e
+ *
+ * Lock The GPU frequency to its minimum frequency:
+ * intel_frequency --min
+ *
+ * Reset the GPU to hardware defaults
+ * intel_frequency -d
+ */
+
+#define _GNU_SOURCE
+#include assert.h
+#include getopt.h
+#include stdio.h
+#include time.h
+#include unistd.h
+
+#include drmtest.h
+#include intel_chipset.h
+
+static int device, devid;
+
+enum {
+   CUR=0,
+   MIN,
+   EFF,
+   MAX,
+   RP0,
+   RPn
+};
+
+struct freq_info {
+   const char *name;
+   const char *mode;
+   FILE *filp;
+   char *path;
+};
+
+static struct freq_info info[] = {
+   { cur, r  },
+   { min, rb+ },
+   { RP1, r },
+   { max, rb+ },
+   { RP0, r },
+   { RPn, r }
+};
+

Re: [Intel-gfx] [PATCH] intel_audio_dump: add support for Cherryview

2015-01-11 Thread Zhenyu Wang
On 2015.01.12 01:38:34 +, Yang, Libin wrote:
 From ebfde852d9efbd7213c391e91be9d0741813bb16 Mon Sep 17 00:00:00 2001
 From: Libin Yang libin.y...@intel.com
 Date: Wed, 7 Jan 2015 10:56:18 +0800
 Subject: [PATCH] intel_audio_dump: add support for Cherryview
 
 This patch adds support for dumping audio registers of Cherryview.
 
 Signed-off-by: Libin Yang libin.y...@intel.com

Just clean up your commit message and pushed the patch.

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827


signature.asc
Description: Digital signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 0/6] sanitize hda/i915 interface using the component fw

2015-01-11 Thread Takashi Iwai
At Mon, 12 Jan 2015 02:32:16 +0100,
Daniel Vetter wrote:
 
 On Fri, Jan 9, 2015 at 11:20 AM, Takashi Iwai ti...@suse.de wrote:
  At Fri, 9 Jan 2015 10:18:45 +0100,
  Daniel Vetter wrote:
 
  On Thu, Jan 08, 2015 at 05:54:12PM +0200, Imre Deak wrote:
   This is v4 of [1] addressing the review comments from Takashi and Jani.
  
   [1]
   http://lists.freedesktop.org/archives/intel-gfx/2014-December/056992.html
  
   Imre Deak (6):
 drm/i915: add dev_to_i915 helper
 drm/i915: add component support
 ALSA: hda: export struct hda_intel
 ALSA: hda: pass intel_hda to all i915 interface functions
 ALSA: hda: add component support
 drm/i915: remove unused power_well/get_cdclk_freq api
 
  All merged to dinq, thanks.
 
  Daniel, could you give a clean branch?
  This will change lots in hd-audio driver code, so I'd like to pull it
  into my tree for further development, too.
 
 Yeah my plan is to createa a branch plus send you a pull request. Just
 wanted to give these patches some testing before heading out to lca.

OK, thanks!


Takashi
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC v2 1/4] drm: Add support to find drm_panel by name

2015-01-11 Thread Kumar, Shobhit

On 1/9/2015 6:20 PM, Jani Nikula wrote:

On Fri, 02 Jan 2015, Shobhit Kumar shobhit.ku...@intel.com wrote:

For scenarios where OF is not available, we can use panel identification by
name.

Signed-off-by: Shobhit Kumar shobhit.ku...@intel.com
---
  drivers/gpu/drm/drm_panel.c | 18 ++
  include/drm/drm_panel.h |  3 +++
  2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/drm_panel.c b/drivers/gpu/drm/drm_panel.c
index 2ef988e..773ebd6 100644
--- a/drivers/gpu/drm/drm_panel.c
+++ b/drivers/gpu/drm/drm_panel.c
@@ -95,6 +95,24 @@ struct drm_panel *of_drm_find_panel(struct device_node *np)
  EXPORT_SYMBOL(of_drm_find_panel);
  #endif

+struct drm_panel *name_drm_find_panel(const char *name)
+{
+   struct drm_panel *panel;
+
+   mutex_lock(panel_lock);
+
+   list_for_each_entry(panel, panel_list, list) {
+   if (strcmp(panel-name, name) == 0) {
+   mutex_unlock(panel_lock);
+   return panel;
+   }
+   }
+
+   mutex_unlock(panel_lock);
+   return NULL;
+}
+EXPORT_SYMBOL(name_drm_find_panel);


This patch needs to be sent to drm-devel.

The name should probably be something like drm_find_panel_by_name.

I have a slightly uneasy feeling about handing out drm_panel pointers
(both from here and of_drm_find_panel) without refcounting. If the panel
driver gets removed, whoever called the find functions will have a
dangling pointer. I supposed this will be discussed on drm-devel.


Right its a valid point. I Will post the updated patch there and see 
what all comes up.


Regards
Shobhit
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx