Re: Why was old TTM removed from drm.git?

2009-06-25 Thread Michel Dänzer
On Wed, 2009-06-24 at 19:26 +0200, Thomas Hellström wrote:
 
 Just prior to the commit I sent out a message explaining what I was
 going to do and why, but apparently it didn't make it to the list
 (which seems to be the case of quite a few mails these days).

What was the From: address and subject of that mail (or any others that
were apparently lost)? I can't seem to find anything in the dri-devel
moderation queue mails around the weekend, so apparently it was dropped
before it reached mailman. Maybe some sf.net spam filter or something.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH 2/2] drm/i915: Adjust DisplayPort clocks to use 96MHz reference

2009-06-25 Thread Keith Packard
For some reason, the DP clocks were based off a 100MHz reference instead of
the standard 96MHz reference. This caused some DP monitors to fail to lock
to the signal.

Signed-off-by: Keith Packard kei...@keithp.com
---
 drivers/gpu/drm/i915/intel_display.c |   21 +
 1 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 73e7b9c..a4a2a03 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -814,24 +814,21 @@ intel_find_pll_g4x_dp(const intel_limit_t *limit, struct 
drm_crtc *crtc,
 {
 intel_clock_t clock;
 if (target  20) {
-   clock.dot = 161670;
-   clock.p = 20;
clock.p1 = 2;
clock.p2 = 10;
-   clock.n = 0x01;
-   clock.m = 97;
-   clock.m1 = 0x10;
-   clock.m2 = 0x05;
+   clock.n = 2;
+   clock.m1 = 23;
+   clock.m2 = 8;
 } else {
-   clock.dot = 27;
-   clock.p = 10;
clock.p1 = 1;
clock.p2 = 10;
-   clock.n = 0x02;
-   clock.m = 108;
-   clock.m1 = 0x12;
-   clock.m2 = 0x06;
+   clock.n = 1;
+   clock.m1 = 14;
+   clock.m2 = 2;
 }
+clock.m = 5 * (clock.m1 + 2) + (clock.m2 + 2);
+clock.p = (clock.p1 * clock.p2);
+clock.dot = 96000 * clock.m / (clock.n + 2) / clock.p;
 memcpy(best_clock, clock, sizeof(intel_clock_t));
 return true;
 }
-- 
1.6.3.1


--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH 1/2] drm: Fix EDID detailed timing misc flags decoding

2009-06-25 Thread Keith Packard
The change from bitfields to masks was done incorrectly for the misc flags
byte.

Signed-off-by: Keith Packard kei...@keithp.com
---
 drivers/gpu/drm/drm_edid.c |   19 
 include/drm/drm_edid.h |   66 ++--
 2 files changed, 57 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 7d08352..c84b306 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -303,12 +303,16 @@ static struct drm_display_mode *drm_mode_detailed(struct 
drm_device *dev,
if (hactive  64 || vactive  64)
return NULL;
 
-   if (pt-misc  DRM_EDID_PT_STEREO) {
+   if (DRM_EDID_DETAILED_MISC_HAS_STEREO(pt-misc)) {
printk(KERN_WARNING stereo mode not supported\n);
return NULL;
}
-   if (!(pt-misc  DRM_EDID_PT_SEPARATE_SYNC)) {
-   printk(KERN_WARNING integrated sync not supported\n);
+   if (!(pt-misc  DRM_EDID_DETAILED_MISC_DIGITAL_SYNC)) {
+   printk(KERN_WARNING analog sync not supported\n);
+   return NULL;
+   }
+   if (!(pt-misc  DRM_EDID_DETAILED_MISC_DIGITAL_SYNC_SEPARATE)) {
+   printk(KERN_WARNING digital composite sync not supported\n);
return NULL;
}
 
@@ -335,16 +339,17 @@ static struct drm_display_mode *drm_mode_detailed(struct 
drm_device *dev,
 
drm_mode_set_name(mode);
 
-   if (pt-misc  DRM_EDID_PT_INTERLACED)
+   if (pt-misc  DRM_EDID_DETAILED_MISC_INTERLACED)
mode-flags |= DRM_MODE_FLAG_INTERLACE;
 
if (quirks  EDID_QUIRK_DETAILED_SYNC_PP) {
-   pt-misc |= DRM_EDID_PT_HSYNC_POSITIVE | 
DRM_EDID_PT_VSYNC_POSITIVE;
+   pt-misc |= (DRM_EDID_DETAILED_MISC_DIGITAL_HSYNC_POSITIVE |
+DRM_EDID_DETAILED_MISC_DIGITAL_VSYNC_POSITIVE);
}
 
-   mode-flags |= (pt-misc  DRM_EDID_PT_HSYNC_POSITIVE) ?
+   mode-flags |= (pt-misc  
DRM_EDID_DETAILED_MISC_DIGITAL_HSYNC_POSITIVE) ?
DRM_MODE_FLAG_PHSYNC : DRM_MODE_FLAG_NHSYNC;
-   mode-flags |= (pt-misc  DRM_EDID_PT_VSYNC_POSITIVE) ?
+   mode-flags |= (pt-misc  
DRM_EDID_DETAILED_MISC_DIGITAL_VSYNC_POSITIVE) ?
DRM_MODE_FLAG_PVSYNC : DRM_MODE_FLAG_NVSYNC;
 
mode-width_mm = pt-width_mm_lo | (pt-width_height_mm_hi  0xf)  8;
diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
index c263e4d..8611539 100644
--- a/include/drm/drm_edid.h
+++ b/include/drm/drm_edid.h
@@ -47,30 +47,54 @@ struct std_timing {
u8 vfreq_aspect;
 } __attribute__((packed));
 
-#define DRM_EDID_PT_HSYNC_POSITIVE (1  6)
-#define DRM_EDID_PT_VSYNC_POSITIVE (1  5)
-#define DRM_EDID_PT_SEPARATE_SYNC  (3  3)
-#define DRM_EDID_PT_STEREO (1  2)
-#define DRM_EDID_PT_INTERLACED (1  1)
+#define DRM_EDID_DETAILED_MISC_INTERLACED  (1  7)
+
+#define DRM_EDID_DETAILED_MISC_STEREO_MASK ((3  5) | (1  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_NONE_0   ((0  5) | (0  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_NONE_1   ((0  5) | (1  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_FIELD_RIGHT  ((1  5) | (0  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_FIELD_LEFT   ((2  5) | (0  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_2WAY_RIGHT   ((1  5) | (1  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_2WAY_LEFT((2  5) | (1 
 0))
+#define DRM_EDID_DETAILED_MISC_STEREO_4WAY ((3  5) | (0  0))
+#define DRM_EDID_DETAILED_MISC_STEREO_SIDE_BY_SIDE ((3  5) | (1  0))
+#define DRM_EDID_DETAILED_MISC_HAS_STEREO(x)   (((x)  (3  5)) != 0)
+
+#define DRM_EDID_DETAILED_MISC_DIGITAL_SYNC(1  4)
+/* Analog sync (embedded with signal) */
+#define DRM_EDID_DETAILED_MISC_ANALOG_SYNC_BIPOLAR (1  3)
+#define DRM_EDID_DETAILED_MISC_ANALOG_SYNC_SERRATIONS  (1  2)
+#define DRM_EDID_DETAILED_MISC_ANALOG_SYNC_ALL_CHAN(1  1)
+
+/* Digital sync (separate from signal) */
+#define DRM_EDID_DETAILED_MISC_DIGITAL_SYNC_SEPARATE   (1  3)
+
+/* Digital composite sync */
+#define DRM_EDID_DETAILED_MISC_DIGITAL_SYNC_SERRATIONS (1  2)
+
+/* Digital separate sync */
+#define DRM_EDID_DETAILED_MISC_DIGITAL_HSYNC_POSITIVE  (1  2)
+#define DRM_EDID_DETAILED_MISC_DIGITAL_VSYNC_POSITIVE  (1  1)
 
 /* If detailed data is pixel timing */
 struct detailed_pixel_timing {
-   u8 hactive_lo;
-   u8 hblank_lo;
-   u8 hactive_hblank_hi;
-   u8 vactive_lo;
-   u8 vblank_lo;
-   u8 vactive_vblank_hi;
-   u8 hsync_offset_lo;
-   u8 hsync_pulse_width_lo;
-   u8 vsync_offset_pulse_width_lo;
-   u8 hsync_vsync_offset_pulse_width_hi;
-   u8 width_mm_lo;
-   u8 height_mm_lo;
-   u8 width_height_mm_hi;
-   u8 hborder;
-   u8 vborder;
-   u8 misc;
+   /* first two bytes are the clock */
+   u8 hactive_lo;  /* 2 */
+   u8 hblank_lo; 

[Bug 21814] Mesa 7.6-devel implementation error: invalid texture object Target value

2009-06-25 Thread bugzilla-daemon
http://bugs.freedesktop.org/show_bug.cgi?id=21814


Fabio fabio@libero.it changed:

   What|Removed |Added

  Attachment #26000|0   |1
is obsolete||
  Attachment #26004|0   |1
is obsolete||
  Attachment #26006|0   |1
is obsolete||
  Attachment #26710|0   |1
is obsolete||




--- Comment #14 from Fabio fabio@libero.it  2009-06-25 00:48:18 PST ---
Created an attachment (id=27117)
 -- (http://bugs.freedesktop.org/attachment.cgi?id=27117)
wine output with mesa master 2009-06-25 (commit cdbcb051)

(In reply to comment #13)
  I will retest when bug 21582 is fixed.
 
 Bug 21582 is marked as fixed but I am still having the same assertion of
 https://bugs.freedesktop.org/attachment.cgi?id=26710.
 
 See also bug 22438.

OK, bug 22438 is fixed but now the game crashes with a different backtrace
(attached).


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC][PATCH] drm/radeon/kms: add initial colortiling support.

2009-06-25 Thread Jerome Glisse
On Wed, 2009-06-24 at 22:32 +0200, Roland Scheidegger wrote:
 On 24.06.2009 20:17, Jerome Glisse wrote:
  I think we should let user ask at gem map ioctl time if userspace wants
  an surface backed mapping or not, and gem map will reply with a success
  or failure. So if object is in vram and there is a surface reg available
  it will succeed, if object is in system ram it will report to userspace
  that there is not automatic untiling and that userspace is on its own
  to untile the buffer.
  
  For the X server that the front buffer is mapped first and never
  unmapped, it should get a surface (assuming no other process already
  stole all the surface). For pixmap i think be better of not using
  tiling for time being (or macro tiling only benchmark below).
  
  Mesa, map/unmap things and should be able to untile on its own for
  front/zbuffer (we need to add texture but i am not sure it's worth
  it, see benchmark below).
 I don't see benchmark with texture tiling below...
 It definitely made some difference though when I implemented (and
 measured...) this, though I never really worried that much about tiled
 compressed textures, not sure micro tiled is even possible (and would
 make sense) but macro tiled certainly should be (but IIRC I tried to
 measure it and it didn't make much of a difference on r200 but it could
 have changed with newer chips).
 That said, don't forget that the performance improvement this gives is
 chip specific, generally giving more improvement with newer chips. IIRC
 you definitely don't want to micro tile the front buffer pre-r300.
 
 Roland

Yeah i loose texture benchmark but it was very small 1-2% on quake3
but maybe quake3 isn't asking for much texture filtering, assuming
filtering is the process which benefit from tiled texture.

Cheers,
Jerome


--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 22438] radeon_common.c:1016: radeon_validate_bo: Assertion `radeon-state.validated_bo_count 32' failed.

2009-06-25 Thread bugzilla-daemon
http://bugs.freedesktop.org/show_bug.cgi?id=22438


Fabio fabio@libero.it changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED




--- Comment #3 from Fabio fabio@libero.it  2009-06-25 00:49:29 PST ---
(In reply to comment #2)
 can you retest with mesa master, I fixed a bug that can cause this to happen.
 

Confirmed fixed, thanks!

Can you take a look at bug 21814 (see comment 14).


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] radeon: preallocate memory for command stream parsing

2009-06-25 Thread Pekka Enberg
Hi Jerome,

On Tue, 2009-06-23 at 22:52 +0300, Pekka Enberg wrote:
  On Tue, Jun 23, 2009 at 10:46 PM, Jerome Glissejgli...@redhat.com wrote:
   Command stream parsing is the most common operation and can
   happen hundred of times per second, we don't want to allocate/free
   memory each time this ioctl is call. This rework the ioctl
   to avoid doing so by allocating temporary memory along the
   ib pool.
  
   Signed-off-by: Jerome Glisse jgli...@redhat.com
  
  So how much does this help (i.e. where are the numbers)? I am bit
  surprised hundred of times per second is an issue for our slab
  allocators. Hmm?

On Wed, 2009-06-24 at 10:29 +0200, Jerome Glisse wrote:
 I didn't have real number but the vmap path was really slower,
 quake3 fps goes from ~20 to ~40 on average when going from vmap
 to preallocated. When using kmalloc i don't thing there was so
 much performance hit. But i think the biggest hit was that in
 previous code i asked for zeroed memory so i am pretty sure kernel
 spend a bit of time clearing page. I reworked the code to avoid
 needing cleared memory and so avoid memset, this is likely why
 we get a performance boost.

OK. If kmalloc() (without memset) really was too slow for your case, I'd
be interested in looking at it in more detail. I'm not completely
convinced the memory pool is needed here but I'm not a DRM expert so I'm
not NAK'ing this either...

Pekka


--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC][PATCH] drm/radeon/kms: add initial colortiling support.

2009-06-25 Thread Stephane Marchesin
On Thu, Jun 25, 2009 at 09:46, Jerome Glissegli...@freedesktop.org wrote:
 On Wed, 2009-06-24 at 22:32 +0200, Roland Scheidegger wrote:
 On 24.06.2009 20:17, Jerome Glisse wrote:
  I think we should let user ask at gem map ioctl time if userspace wants
  an surface backed mapping or not, and gem map will reply with a success
  or failure. So if object is in vram and there is a surface reg available
  it will succeed, if object is in system ram it will report to userspace
  that there is not automatic untiling and that userspace is on its own
  to untile the buffer.
 
  For the X server that the front buffer is mapped first and never
  unmapped, it should get a surface (assuming no other process already
  stole all the surface). For pixmap i think be better of not using
  tiling for time being (or macro tiling only benchmark below).
 
  Mesa, map/unmap things and should be able to untile on its own for
  front/zbuffer (we need to add texture but i am not sure it's worth
  it, see benchmark below).
 I don't see benchmark with texture tiling below...
 It definitely made some difference though when I implemented (and
 measured...) this, though I never really worried that much about tiled
 compressed textures, not sure micro tiled is even possible (and would
 make sense) but macro tiled certainly should be (but IIRC I tried to
 measure it and it didn't make much of a difference on r200 but it could
 have changed with newer chips).
 That said, don't forget that the performance improvement this gives is
 chip specific, generally giving more improvement with newer chips. IIRC
 you definitely don't want to micro tile the front buffer pre-r300.

 Roland

 Yeah i loose texture benchmark but it was very small 1-2% on quake3
 but maybe quake3 isn't asking for much texture filtering, assuming
 filtering is the process which benefit from tiled texture.


IIRC the microtiling mode will only benefit the exotic filtering modes
(anisotropic for example). Did you try this?

Stephane

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Why was old TTM removed from drm.git?

2009-06-25 Thread Thomas Hellström

Michel Dänzer skrev:

On Wed, 2009-06-24 at 19:26 +0200, Thomas Hellström wrote:
  

Just prior to the commit I sent out a message explaining what I was
going to do and why, but apparently it didn't make it to the list
(which seems to be the case of quite a few mails these days).



What was the From: address and subject of that mail (or any others that
were apparently lost)? I can't seem to find anything in the dri-devel
moderation queue mails around the weekend, so apparently it was dropped
before it reached mailman. Maybe some sf.net spam filter or something.


  

Hi!
Attaching the mail and another one that was stripped. Furthermore
I had two patches sent by git-send-email stripped away, but when I 
routed them through the vmware smtp server they arrived.


It might be that sf.net doesn't like my isp email server.

/Thomas.



---BeginMessage---

Hi!

I'm about to push a commit that strips old TTM from the drm git repo. 
Not sure if anybody uses it for other things than libdrm, but at least 
that will remove some unused code.


The master nouveau driver will be disabled since it depends on old ttm.

/Thomas


---End Message---
---BeginMessage---

Okias,

The documentation available is the Xorg wiki TTM page (a little
outdated) and the ttm header files
which are quite well commented.

Currently there are three drivers using it
1) The Radeon KMS driver using a subset of the TTM functionality.
2) The Intel moorestown / Poulsbo driver which uses the full TTM
functionality including modesetting. Look at the list archives for
pointers to that.
3) The openchrome driver in the modesetting-newttm branch. No
modesetting yet for that one.

Note that the latter 2 drivers are using a tiny TTM user-space interface
which is never going to make it to the mainstream kernel. The openChrome
driver will be patched up to use a driver-specific version of that
interface.

/Thomas


okias wrote:

Hello,

exist any documentation related to newttm (except already exist
drivers) + any HowTo for 'convert' fb driver + xorg driver to support
memory manager + kms?

Thanks

okias

--
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
  




---End Message---
--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] radeon: preallocate memory for command stream parsing

2009-06-25 Thread Thomas Hellström
Pekka Enberg skrev:
 Hi Jerome,

 On Tue, 2009-06-23 at 22:52 +0300, Pekka Enberg wrote:
   
 On Tue, Jun 23, 2009 at 10:46 PM, Jerome Glissejgli...@redhat.com wrote:
   
 Command stream parsing is the most common operation and can
 happen hundred of times per second, we don't want to allocate/free
 memory each time this ioctl is call. This rework the ioctl
 to avoid doing so by allocating temporary memory along the
 ib pool.

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 
 So how much does this help (i.e. where are the numbers)? I am bit
 surprised hundred of times per second is an issue for our slab
 allocators. Hmm?
   

 On Wed, 2009-06-24 at 10:29 +0200, Jerome Glisse wrote:
   
 I didn't have real number but the vmap path was really slower,
 quake3 fps goes from ~20 to ~40 on average when going from vmap
 to preallocated. When using kmalloc i don't thing there was so
 much performance hit. But i think the biggest hit was that in
 previous code i asked for zeroed memory so i am pretty sure kernel
 spend a bit of time clearing page. I reworked the code to avoid
 needing cleared memory and so avoid memset, this is likely why
 we get a performance boost.
 

 OK. If kmalloc() (without memset) really was too slow for your case, I'd
 be interested in looking at it in more detail. I'm not completely
 convinced the memory pool is needed here but I'm not a DRM expert so I'm
 not NAK'ing this either...

   Pekka

   
Hi!
 From previous experience with other drivers kmalloc() is just fine 
performance-wise.
I've also never seen memsetting pages turn up on the profile. It would 
be interesting to see an oprofile timing of this to try and pinpoint 
what's happening.

However, in this case, I believe Jerome was forced to use vmalloc to 
guarantee that the allocation would succeed, and frequent vmallocs seem 
to be a performance killer.

One should also be careful about frame-rates. Tuning memory manager / 
command submission operation is usually a matter of how much CPU is 
consumed for a given framerate. If one compares framerates one must make 
sure that the CPU is at nearly 100% while benchmarking.

/Thomas



 --
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel
   


--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Why was old TTM removed from drm.git?

2009-06-25 Thread Michel Dänzer
On Thu, 2009-06-25 at 10:55 +0200, Thomas Hellström wrote:
 Michel Dänzer skrev:
  On Wed, 2009-06-24 at 19:26 +0200, Thomas Hellström wrote:

  Just prior to the commit I sent out a message explaining what I was
  going to do and why, but apparently it didn't make it to the list
  (which seems to be the case of quite a few mails these days).
  
 
  What was the From: address and subject of that mail (or any others that
  were apparently lost)? I can't seem to find anything in the dri-devel
  moderation queue mails around the weekend, so apparently it was dropped
  before it reached mailman. Maybe some sf.net spam filter or something.
 
 

 Hi!
 Attaching the mail and another one that was stripped. Furthermore
 I had two patches sent by git-send-email stripped away, but when I 
 routed them through the vmware smtp server they arrived.
 
 It might be that sf.net doesn't like my isp email server.

Yeah, something like that seems like the most plausible explanation.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 22371] [DRI2] TORCS causes assertion failure in r200 dri driver

2009-06-25 Thread bugzilla-daemon
http://bugs.freedesktop.org/show_bug.cgi?id=22371





--- Comment #3 from Pauli suok...@gmail.com  2009-06-25 03:34:47 PST ---
No. Still same assertion failure.


I will have a look what torcs is trying to render there. It might help solving
the problem.


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] radeon: preallocate memory for command stream parsing

2009-06-25 Thread Jerome Glisse
On Thu, 2009-06-25 at 11:03 +0200, Thomas Hellström wrote:
 Pekka Enberg skrev:
  Hi Jerome,
 
  On Tue, 2009-06-23 at 22:52 +0300, Pekka Enberg wrote:

  On Tue, Jun 23, 2009 at 10:46 PM, Jerome Glissejgli...@redhat.com wrote:

  Command stream parsing is the most common operation and can
  happen hundred of times per second, we don't want to allocate/free
  memory each time this ioctl is call. This rework the ioctl
  to avoid doing so by allocating temporary memory along the
  ib pool.
 
  Signed-off-by: Jerome Glisse jgli...@redhat.com
  
  So how much does this help (i.e. where are the numbers)? I am bit
  surprised hundred of times per second is an issue for our slab
  allocators. Hmm?

 
  On Wed, 2009-06-24 at 10:29 +0200, Jerome Glisse wrote:

  I didn't have real number but the vmap path was really slower,
  quake3 fps goes from ~20 to ~40 on average when going from vmap
  to preallocated. When using kmalloc i don't thing there was so
  much performance hit. But i think the biggest hit was that in
  previous code i asked for zeroed memory so i am pretty sure kernel
  spend a bit of time clearing page. I reworked the code to avoid
  needing cleared memory and so avoid memset, this is likely why
  we get a performance boost.
  
 
  OK. If kmalloc() (without memset) really was too slow for your case, I'd
  be interested in looking at it in more detail. I'm not completely
  convinced the memory pool is needed here but I'm not a DRM expert so I'm
  not NAK'ing this either...
 
  Pekka
 

 Hi!
  From previous experience with other drivers kmalloc() is just fine 
 performance-wise.
 I've also never seen memsetting pages turn up on the profile. It would 
 be interesting to see an oprofile timing of this to try and pinpoint 
 what's happening.
 
 However, in this case, I believe Jerome was forced to use vmalloc to 
 guarantee that the allocation would succeed, and frequent vmallocs seem 
 to be a performance killer.
 
 One should also be careful about frame-rates. Tuning memory manager / 
 command submission operation is usually a matter of how much CPU is 
 consumed for a given framerate. If one compares framerates one must make 
 sure that the CPU is at nearly 100% while benchmarking.
 
 /Thomas
 

To give a more correct rough estimate, at 60fps we will issue somethings
like 10 to 50 cs ioctl per frame so it's several thousands of cs ioctl
so several thousands of 64K allocation, and memory clearing, i believe
such things would show up on profile. I am not running kernel without
my patch as i am working on other stuff now, but i will lower the pool
size so that we don't waste too much memory right now i think the
preallocation use somethings around 8M of memory. I think i can get it
down to 1M without impacting performance (even less if we are on pcie).

Cheers,
Jerome


--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC][PATCH] drm/radeon/kms: add initial colortiling support.

2009-06-25 Thread Jerome Glisse
On Wed, 2009-06-24 at 10:21 +1000, Dave Airlie wrote:
 From: Dave Airlie airl...@redhat.com
 
 This adds color tiling support for buffers in VRAM, it enables
 a color tiled fbcon and a color tiled X frontbuffer.
 
 It changes the API:
 adds two new parameters to the object creation API (is this better than
  a set/get tiling?) we probably still need a get but not sure for what yet.
 relocs are required for 2D DST_PITCH_OFFSET and SRC_PITCH_OFFSET type-0,
 and 3D COLORPITCH registers.
 
 TTM:
 adds a new check_tiling call to TTM, gets called at fault and around
 bo moves.
 
 Issues:
 Can we integrate endian swapping in with this?
 Depth buffer handling?
 When we run out of surface regs - needs better handling.
 
 Future features:
 texture tiling - need to relocate texture registers TXOFFSET* with tiling 
 info.

Worked on that today and i think we can restrict our self to set
surface only for pinned buffer (ie scanout buffer). Use case, which
i believe defite surface : tiled bo in vram mapped with surface
backing it, userspace know that it can access linearly the surface,
bo get evicted from vram and userspace mapping invalidated, there is
no way (at least i don't think so, except calling some ioctl on each
memory access which is a no go) for userspace to know that now it
has to until by itself.

So i believe we should only set surface on scanoutbuffer and let
userspace deal with untiling. I don't think this is a drawback,
we can use blit to until a buffer (macro only) or simply have
clever code to access the memory, if we are in a fallback we already
loose.

Am i missing somethings ? Does it make sense to only program
surface on scanout (this would give a simpler API for tiling) ?

Cheers,
Jerome


--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC][PATCH] drm/radeon/kms: add initial colortiling support.

2009-06-25 Thread Roland Scheidegger
On 25.06.2009 10:32, Stephane Marchesin wrote:
 On Thu, Jun 25, 2009 at 09:46, Jerome Glissegli...@freedesktop.org wrote:
 On Wed, 2009-06-24 at 22:32 +0200, Roland Scheidegger wrote:
 On 24.06.2009 20:17, Jerome Glisse wrote:
 I think we should let user ask at gem map ioctl time if userspace wants
 an surface backed mapping or not, and gem map will reply with a success
 or failure. So if object is in vram and there is a surface reg available
 it will succeed, if object is in system ram it will report to userspace
 that there is not automatic untiling and that userspace is on its own
 to untile the buffer.

 For the X server that the front buffer is mapped first and never
 unmapped, it should get a surface (assuming no other process already
 stole all the surface). For pixmap i think be better of not using
 tiling for time being (or macro tiling only benchmark below).

 Mesa, map/unmap things and should be able to untile on its own for
 front/zbuffer (we need to add texture but i am not sure it's worth
 it, see benchmark below).
 I don't see benchmark with texture tiling below...
 It definitely made some difference though when I implemented (and
 measured...) this, though I never really worried that much about tiled
 compressed textures, not sure micro tiled is even possible (and would
 make sense) but macro tiled certainly should be (but IIRC I tried to
 measure it and it didn't make much of a difference on r200 but it could
 have changed with newer chips).
 That said, don't forget that the performance improvement this gives is
 chip specific, generally giving more improvement with newer chips. IIRC
 you definitely don't want to micro tile the front buffer pre-r300.

 Roland
 Yeah i loose texture benchmark but it was very small 1-2% on quake3
 but maybe quake3 isn't asking for much texture filtering, assuming
 filtering is the process which benefit from tiled texture.

 
 IIRC the microtiling mode will only benefit the exotic filtering modes
 (anisotropic for example). Did you try this?

I am pretty sure it made quite a difference with normal trilinear
filtering (otherwise I never would have bothered implementing it in the
first place). Can't remember exactly but probably around 10% or so. Not
sure if macro or micro tiling helped more, but both together were
definitely accounting for more than 2% (unless using compressed
textures). You are right though I think with bilinear (which q3 uses as
default) there was less difference. That was on rv250 back then, and it
could be different on newer chips (could depend quite a bit on if it's a
chip with a lot of memory bandwidth or not too).

Roland

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 13610] radeon kms invalid edid data at lvds

2009-06-25 Thread bugzilla-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=13610


Andrew Morton a...@linux-foundation.org changed:

   What|Removed |Added

 CC||a...@linux-foundation.org




--- Comment #1 from Andrew Morton a...@linux-foundation.org  2009-06-25 
19:47:11 ---
There's no such kernel version as 2.6.31-rc0.

I assume that you meant 2.6.30?

Thanks.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 13610] radeon kms invalid edid data at lvds

2009-06-25 Thread bugzilla-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=13610


Rafael J. Wysocki r...@sisk.pl changed:

   What|Removed |Added

 CC||r...@sisk.pl
 Blocks||13615




--- Comment #2 from Rafael J. Wysocki r...@sisk.pl  2009-06-25 20:54:28 ---
I think he meant a post-2.6.30 git kernel.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH] drm/radeon: fix support for vline relocations.

2009-06-25 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

Userspace sends us a special relocation type to sync video/exa
to vlines to avoid tearing, this deals with the relocation
in the kernel, it picks the correct crtc and avoids issues
where crtcs are disabled.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 drivers/gpu/drm/radeon/r100.c |   89 +
 drivers/gpu/drm/radeon/r300.c |   13 +-
 drivers/gpu/drm/radeon/r500_reg.h |2 +
 drivers/gpu/drm/radeon/rv515.c|   23 +++--
 4 files changed, 121 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index c550932..bbc4cb5 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -753,6 +753,86 @@ int r100_cs_packet_parse(struct radeon_cs_parser *p,
 }
 
 /**
+ * r100_cs_packet_next_vline() - parse userspace VLINE packet
+ * @parser:parser structure holding parsing context.
+ *
+ * Userspace sends a special sequence for VLINE waits.
+ * PACKET0 - VLINE_START_END + value
+ * PACKET0 - WAIT_UNTIL +_value
+ * RELOC (P3) - crtc_id in reloc.
+ *
+ * This function parses this and relocates the VLINE START END
+ * and WAIT UNTIL packets to the correct crtc.
+ * It also detects a switched off crtc and nulls out the
+ * wait in that case.
+ */
+int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
+{
+   struct radeon_cs_chunk *ib_chunk;
+   struct drm_mode_object *obj;
+   struct ttm_crtc *crtc;
+   struct radeon_crtc *radeon_crtc;
+   struct radeon_cs_packet p3reloc, waitreloc;
+   int crtc_id;
+   int r;
+   uint32_t header, h_idx, reg;
+
+   /* jump over the wait until */
+   r = r100_cs_packet_parse(p, waitreloc, p-idx);
+   if (r)
+   return r;
+
+   /* jump over the NOP */
+   r = r100_cs_packet_parse(p, p3reloc, p-idx);
+   if (r)
+   return r;
+   h_idx = p-idx - 2;
+   p-idx += waitreloc.count;
+   p-idx += p3reloc.count;
+
+   ib_chunk = p-chunks[p-chunk_ib_idx];
+   header = ib_chunk-kdata[h_idx];
+   crtc_id = ib_chunk-kdata[h_idx + 5];
+   reg = ib_chunk-kdata[h_idx]  2;
+   mutex_lock(p-rdev-ddev-mode_config.mutex);
+   obj = drm_mode_object_find(p-rdev-ddev, crtc_id, 
DRM_MODE_OBJECT_CRTC);
+   if (!obj) {
+   DRM_ERROR(cannot find crtc %d\n, crtc_id);
+   r = -EINVAL;
+   goto out;
+   }
+   crtc = obj_to_crtc(obj);
+   radeon_crtc = to_radeon_crtc(crtc);
+   crtc_id = radeon_crtc-crtc_id;
+
+   if (!crtc-enabled) {
+   /* if the CRTC isn't enabled - we need to nop out the wait 
until */
+   ib_chunk-kdata[h_idx + 2] = PACKET2(0);
+   ib_chunk-kdata[h_idx + 3] = PACKET2(0);
+   } else if (crtc_id == 1) {
+   switch (reg) {
+   case AVIVO_D1MODE_VLINE_START_END:
+   header = R300_CP_PACKET0_REG_MASK;
+   header |= AVIVO_D2MODE_VLINE_START_END  2;
+   break;
+   case RADEON_CRTC_GUI_TRIG_VLINE:
+   header = R300_CP_PACKET0_REG_MASK;
+   header |= RADEON_CRTC2_GUI_TRIG_VLINE  2;
+   break;
+   default:
+   DRM_ERROR(unknown crtc reloc\n);
+   r = -EINVAL;
+   goto out;
+   }
+   ib_chunk-kdata[h_idx] = header;
+   ib_chunk-kdata[h_idx + 3] |= RADEON_ENG_DISPLAY_SELECT_CRTC1;
+   }
+out:
+   mutex_unlock(p-rdev-ddev-mode_config.mutex);
+   return r;
+}
+
+/**
  * r100_cs_packet_next_reloc() - parse next packet which should be reloc 
packet3
  * @parser:parser structure holding parsing context.
  * @data:  pointer to relocation data
@@ -825,6 +905,15 @@ static int r100_packet0_check(struct radeon_cs_parser *p,
}
for (i = 0; i = pkt-count; i++, idx++, reg += 4) {
switch (reg) {
+   case RADEON_CRTC_GUI_TRIG_VLINE:
+   r = r100_cs_packet_parse_vline(p);
+   if (r) {
+   DRM_ERROR(No reloc for ib[%d]=0x%04X\n,
+   idx, reg);
+   r100_cs_dump_packet(p, pkt);
+   return r;
+   }
+   break;
/* FIXME: only allow PACKET3 blit? easier to check for out of
 * range access */
case RADEON_DST_PITCH_OFFSET:
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index e2ed5bc..da87869 100644
--- a/drivers/gpu/drm/radeon/r300.c
+++ b/drivers/gpu/drm/radeon/r300.c
@@ -44,6 +44,7 @@ int r100_gui_wait_for_idle(struct radeon_device *rdev);
 int r100_cs_packet_parse(struct radeon_cs_parser *p,
 

Re: TTM page pool allocator

2009-06-25 Thread Dave Airlie
On Thu, Jun 25, 2009 at 10:01 PM, Jerome Glissegli...@freedesktop.org wrote:
 Hi,

 Thomas i attach a reworked page pool allocator based on Dave works,
 this one should be ok with ttm cache status tracking. It definitely
 helps on AGP system, now the bottleneck is in mesa vertex's dma
 allocation.


My original version kept a list of wb pages as well, this proved to be
quite a useful
optimisation on my test systems when I implemented it, without it I
was spending ~20%
of my CPU in getting free pages, granted I always used WB pages on
PCIE/IGP systems.

Another optimisation I made at the time was around the populate call,
(not sure if this
is what still happens):

Allocate a 64K local BO for DMA object.
Write into the first 5 pages from userspace - get WB pages.
Bind to GART, swap those 5 pages to WC + flush.
Then populate the rest with WC pages from the list.

Granted I think allocating WC in the first place from the pool might
work just as well since most of the DMA buffers are write only.

Dave.

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm: Fix EDID detailed timing misc flags decoding

2009-06-25 Thread Keith Packard
On Thu, 2009-06-25 at 16:50 -0700, Jesse Barnes wrote:

 Is this mixing up the pixel block definition of sync vs the basic
 block definition (which has all the composite, green etc bits)?

The VESA spec seems to allow detailed modes to send sync signals in
different ways -- embedded with the analog signal, on one separate wire
or on two separate wires.

 Looks ok, but according to wikipedia hsync+ is 11 and vsync+ is 12
 instead.  Other than that the #defines look ok (and I wouldn't trust
 wikipedia; iirc it had a few errors when I looked at it last).

Wikipedia is correct and my code was wrong. good catch!

-- 
keith.pack...@intel.com


signature.asc
Description: This is a digitally signed message part
--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm: Fix EDID detailed timing misc flags decoding

2009-06-25 Thread Jesse Barnes
On Wed, 24 Jun 2009 23:49:02 -0700
Keith Packard kei...@keithp.com wrote:
 - if (pt-misc  DRM_EDID_PT_STEREO) {
 + if (DRM_EDID_DETAILED_MISC_HAS_STEREO(pt-misc)) {
   printk(KERN_WARNING stereo mode not supported\n);
   return NULL;

Looks correct.

   }
 - if (!(pt-misc  DRM_EDID_PT_SEPARATE_SYNC)) {
 - printk(KERN_WARNING integrated sync not
 supported\n);
 + if (!(pt-misc  DRM_EDID_DETAILED_MISC_DIGITAL_SYNC)) {
 + printk(KERN_WARNING analog sync not supported\n);
 + return NULL;
 + }
 + if (!(pt-misc 
 DRM_EDID_DETAILED_MISC_DIGITAL_SYNC_SEPARATE)) {
 + printk(KERN_WARNING digital composite sync not
 supported\n); return NULL;
   }

Is this mixing up the pixel block definition of sync vs the basic
block definition (which has all the composite, green etc bits)?

  
 @@ -335,16 +339,17 @@ static struct drm_display_mode
 *drm_mode_detailed(struct drm_device *dev, 
   drm_mode_set_name(mode);
  
 - if (pt-misc  DRM_EDID_PT_INTERLACED)
 + if (pt-misc  DRM_EDID_DETAILED_MISC_INTERLACED)
   mode-flags |= DRM_MODE_FLAG_INTERLACE;

This looks right.

  
   if (quirks  EDID_QUIRK_DETAILED_SYNC_PP) {
 - pt-misc |= DRM_EDID_PT_HSYNC_POSITIVE |
 DRM_EDID_PT_VSYNC_POSITIVE;
 + pt-misc |=
 (DRM_EDID_DETAILED_MISC_DIGITAL_HSYNC_POSITIVE |
 +
 DRM_EDID_DETAILED_MISC_DIGITAL_VSYNC_POSITIVE); }

Looks ok, but according to wikipedia hsync+ is 11 and vsync+ is 12
instead.  Other than that the #defines look ok (and I wouldn't trust
wikipedia; iirc it had a few errors when I looked at it last).

-- 
Jesse Barnes, Intel Open Source Technology Center

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm: Fix EDID detailed timing misc flags decoding

2009-06-25 Thread Jesse Barnes
On Thu, 25 Jun 2009 17:57:47 -0700
Keith Packard kei...@keithp.com wrote:

 On Thu, 2009-06-25 at 16:50 -0700, Jesse Barnes wrote:
 
  Is this mixing up the pixel block definition of sync vs the basic
  block definition (which has all the composite, green etc bits)?
 
 The VESA spec seems to allow detailed modes to send sync signals in
 different ways -- embedded with the analog signal, on one separate
 wire or on two separate wires.

Ok cool, I didn't have it in front of me so I wasn't sure of the
encoding in the detailed timing vs basic block section.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm: Fix EDID detailed timing misc flags decoding

2009-06-25 Thread Keith Packard
On Thu, 2009-06-25 at 18:44 -0700, Jesse Barnes wrote:

 Ok cool, I didn't have it in front of me so I wasn't sure of the
 encoding in the detailed timing vs basic block section.

Any X.org member can get a complete copy of the VESA specs. Recommended
for review of spec-related patches. This may be the only real benefit of
membership, but it's a pretty useful one.

-- 
keith.pack...@intel.com


signature.asc
Description: This is a digitally signed message part
--
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel