[radeon] EDID checksum is invalid

2011-12-05 Thread Alexander Beregalov
Hi

3.2.0-rc3-00015-gaaa0b4f

dmesg |egrep "drm|radeon"

Command line: root=/dev/sda2 radeon.modeset=1 ro
Kernel command line: root=/dev/sda2 radeon.modeset=1 ro
[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
radeon :01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
radeon :01:00.0: setting latency timer to 64
[drm] initializing kernel modesetting (RV730 0x1002:0x9490 0x174B:0xE100).
[drm] register mmio base: 0xF500
[drm] register mmio size: 65536
radeon :01:00.0: VRAM: 512M 0x -
0x1FFF (512M used)
radeon :01:00.0: GTT: 512M 0x2000 - 0x3FFF
[drm] Detected VRAM RAM=512M, BAR=256M
[drm] RAM width 128bits DDR
[drm] radeon: 512M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
radeon :01:00.0: irq 46 for MSI/MSI-X
radeon :01:00.0: radeon: using MSI.
[drm] radeon: irq initialized.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RV730 Microcode
[drm] PCIE GART of 512M enabled (table at 0x0004).
radeon :01:00.0: WB enabled
[drm] ring test succeeded in 0 usecs
[drm] radeon: ib pool ready.
[drm] ib test succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA
[drm]   DDC: 0x7e20 0x7e20 0x7e24 0x7e24 0x7e28 0x7e28 0x7e2c 0x7e2c
[drm]   Encoders:
[drm] CRT2: INTERNAL_KLDSCP_DAC2
[drm] Connector 1:
[drm]   HDMI-A
[drm]   HPD2
[drm]   DDC: 0x7f10 0x7f10 0x7f14 0x7f14 0x7f18 0x7f18 0x7f1c 0x7f1c
[drm]   Encoders:
[drm] DFP2: INTERNAL_UNIPHY1
[drm] Connector 2:
[drm]   DVI-I
[drm]   HPD1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] DFP1: INTERNAL_UNIPHY
[drm] Internal thermal controller with fan control
[drm] radeon: power management initialized
[drm] fb mappable at 0xE0142000
[drm] vram apper at 0xE000
[drm] size 5242880
[drm] fb depth is 24
[drm]pitch is 5120
fbcon: radeondrmfb (fb0) is primary device
fb0: radeondrmfb frame buffer device
drm: registered panic notifier
[drm] Initialized radeon 2.12.0 20080528 for :01:00.0 on minor 0

At boot time:
[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 206
 Raw EDID:
27 0f 01 03 80 26 1e 78 2a de 95 a3 54 4c 99 26
0f 50 54 bf ef 80 81 80 81 40 71 4f 01 01 01 01
01 01 01 01 01 01 30 2a 00 98 51 00 2a 40 30 70
13 00 78 2d 11 00 00 1e 00 00 00 fd 00 38 4b 1e
51 0e 00 0a 20 20 20 20 20 20 00 00 00 fc 00 53
79 6e 63 4d 61 73 74 65 72 0a 20 20 00 00 00 ff
00 48 53 47 59 39 30 37 30 33 32 0a 20 20 00 59
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 128
 Raw EDID:
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Few days after that:
[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 38
 Raw EDID:
30 2a 00 98 51 00 2a 40 30 70 13 00 78 2d 11 00
00 1e 00 00 00 fd 00 38 4b 1e 51 0e 00 0a 20 20
20 20 20 20 00 00 00 fc 00 53 79 6e 63 4d 61 73
74 65 72 0a 20 20 00 00 00 ff 00 48 53 47 59 39
30 37 30 33 32 0a 20 20 00 59 ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Card is
ATI Technologies Inc RV730XT [Radeon HD 4670] [1002:9490] (prog-if 00
[VGA controller])


get-edid |parse-edid


get-edid: get-edid version 2.0.0

Performing real mode VBE call
Interrupt 0x10 ax=0x4f00 bx=0x0 cx=0x0
Function supported
Call successful

VBE version 300
VBE string at 0xc01dc "ATI ATOMBIOS"

VBE/DDC service about to be called
Report DDC capabilities

Performing real mode VBE call
Interrupt 0x10 ax=0x4f15 bx=0x0 cx=0x0
Function supported
Call successful

Monitor and video card combination does not support DDC1 transfers
Monitor and video card combination supports DDC2 transfers
0 seconds per 128 byte EDID block transfer
Screen is not blanked during DDC transfer

Reading next EDID block

VBE/DDC service about to be called
Read EDID

Performing real mode VBE call
Interrupt 0x10 ax=0x4f15 bx=0x1 cx=0x0
Function supported
Call successful

parse-edid: EDID checksum passed.

# EDID version 1 

[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 11:04:09PM +0100, Arnd Bergmann wrote:
> On Monday 05 December 2011 21:58:39 Daniel Vetter wrote:
> > On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
> > > ...
> >
> > Thanks a lot for this excellent overview. I think at least for the first
> > version of dmabuf we should drop the sync_* interfaces and simply require
> > users to bracket their usage of the buffer from the attached device by
> > map/unmap. A dma_buf provider is always free to cache the mapping and
> > simply call sync_sg_for of the streaming dma api.
>
> I think we still have the same problem if we allow multiple drivers
> to access a noncoherent buffer using map/unmap:
>
>   driver Adriver B
>
> 1.read/write  
> 2.read/write
> 3.map()   
> 4.read/write
> 5.dma
> 6.map()
> 7.dma
> 8.dma
> 9.unmap()
> 10.   dma
> 11.   read/write
> 12.   unmap() 
> 
>
>
> In step 4, the buffer is owned by device A, but accessed by driver B, which
> is a bug. In step 11, the buffer is owned by device B but accessed by driver
> A, which is the same bug on the other side. In steps 7 and 8, the buffer
> is owned by both device A and B, which is currently undefined but would
> be ok if both devices are on the same coherency domain. Whether that point
> is meaningful depends on what the devices actually do. It would be ok
> if both are only reading, but not if they write into the same location
> concurrently.
>
> As I mentioned originally, the problem could be completely avoided if
> we only allow consistent (e.g. uncached) mappings or buffers that
> are not mapped into the kernel virtual address space at all.
>
> Alternatively, a clearer model would be to require each access to
> nonconsistent buffers to be exclusive: a map() operation would have
> to block until the current mapper (if any) has done an unmap(), and
> any access from the CPU would also have to call a dma_buf_ops pointer
> to serialize the CPU accesses with any device accesses. User
> mappings of the buffer can be easily blocked during a DMA access
> by unmapping the buffer from user space at map() time and blocking the
> vm_ops->fault() operation until the unmap().

See my other mail where I propose a more explicit coherency model, just a
comment here: GPU drivers hate blocking interfaces. Loathe, actually. In
general they're very happy to extend you any amount of rope if it can make
userspace a few percent faster.

So I think the right answer here is: You've asked for trouble, you've got
it. Also see the issue raised by Rob, at least for opengl (and also for
other graphics interfaces) the kernel is not even aware of all outstanding
rendering. So userspace needs to orchestrate access anyway if a gpu is
involved.

Otherwise I agree with your points in this mail.
-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 04:11:46PM -0600, Rob Clark wrote:
> On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter  wrote:
> > On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
> >> I sort of preferred having the DMABUF shim because that lets you pass
> >> a buffer around userspace without the receiving code knowing about a
> >> device specific API. ?But the problem I eventually came around to: if
> >> your GL stack (or some other userspace component) is batching up
> >> commands before submission to kernel, the buffers you need to wait for
> >> completion might not even be submitted yet. ?So from kernel
> >> perspective they are "ready" for cpu access. ?Even though in fact they
> >> are not in a consistent state from rendering perspective. ?I don't
> >> really know a sane way to deal with that. ?Maybe the approach instead
> >> should be a userspace level API (in libkms/libdrm?) to provide
> >> abstraction for userspace access to buffers rather than dealing with
> >> this at the kernel level.
> >
> > Well, there's a reason GL has an explicit flush and extensions for sync
> > objects. It's to support such scenarios where the driver batches up gpu
> > commands before actually submitting them.
>
> Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
> (Or something that I haven't thought of.)

They generally all have a concept of when they've actually commited the
rendering to an X pixmap or egl image. Usually it's rather implicit, e.g.
the driver will submit any outstanding batches before returning from any
calls.
-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


i915: eDP regression

2011-12-05 Thread Kirill A. Shutemov
On Mon, Dec 05, 2011 at 09:37:15AM -0500, Adam Jackson wrote:
> On Sat, 2011-12-03 at 19:35 +0200, Kirill A. Shutemov wrote:
> > Hi,
> > 
> > Commit dc22ee6 introduces regression on my laptop HP EliteBook 8440p.  I see
> > nothing on the panel after mode setting. Reverting the commit fixes the 
> > issue.
> 
> Try this patch (might need rediffing):
> 
> http://www.mail-archive.com/intel-gfx at lists.freedesktop.org/msg05889.html

Rediffing looks not trivial (hunk #2 of drivers/gpu/drm/i915/intel_dp.c).
Could you provide a patch?

-- 
 Kirill A. Shutemov


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Monday 05 December 2011 14:46:47 Rob Clark wrote:
> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann  wrote:
> > On Friday 02 December 2011, Sumit Semwal wrote:
> >> This is the first step in defining a dma buffer sharing mechanism.
> >
> > This looks very nice, but there are a few things I don't understand yet
> > and a bunch of trivial comments I have about things I spotted.
> >
> > Do you have prototype exporter and consumer drivers that you can post
> > for clarification?
> 
> There is some dummy drivers based on an earlier version.  And airlied
> has a prime (multi-gpu) prototype:
> 
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf
> 
> I've got a nearly working camera+display prototype:
> 
> https://github.com/robclark/kernel-omap4/commits/dmabuf

Ok, thanks. I think it would be good to post these for reference
in v3, with a clear indication that they are not being submitted
for discussion/inclusion yet.

> > In the patch 2, you have a section about migration that mentions that
> > it is possible to export a buffer that can be migrated after it
> > is already mapped into one user driver. How does that work when
> > the physical addresses are mapped into a consumer device already?
> 
> I think you can do physical migration if you are attached, but
> probably not if you are mapped.

Ok, that's what I thought.

> > You probably mean "if (ret)" here instead of "if (!ret)", right?
> >
> >> + /* allow allocator to take care of cache ops */
> >> + void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> >> + void (*sync_sg_for_device)(struct dma_buf *, struct device *);
> >
> > I don't see how this works with multiple consumers: For the streaming
> > DMA mapping, there must be exactly one owner, either the device or
> > the CPU. Obviously, this rule needs to be extended when you get to
> > multiple devices and multiple device drivers, plus possibly user
> > mappings. Simply assigning the buffer to "the device" from one
> > driver does not block other drivers from touching the buffer, and
> > assigning it to "the cpu" does not stop other hardware that the
> > code calling sync_sg_for_cpu is not aware of.
> >
> > The only way to solve this that I can think of right now is to
> > mandate that the mappings are all coherent (i.e. noncachable
> > on noncoherent architectures like ARM). If you do that, you no
> > longer need the sync_sg_for_* calls.
> 
> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
> ioctls and corresponding dmabuf ops, which userspace is required to
> call before / after CPU access.  Or just remove mmap() and do the
> mmap() via allocating device and use that device's equivalent
> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
> would give you a way to (a) synchronize with gpu/asynchronous
> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
> buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
> that gives you a convenient place to do cache operations on
> noncoherent architecture.

I wasn't even thinking of user mappings, as I replied to Daniel, I
think they are easy to solve (maybe not efficiently though)

> I sort of preferred having the DMABUF shim because that lets you pass
> a buffer around userspace without the receiving code knowing about a
> device specific API.  But the problem I eventually came around to: if
> your GL stack (or some other userspace component) is batching up
> commands before submission to kernel, the buffers you need to wait for
> completion might not even be submitted yet.  So from kernel
> perspective they are "ready" for cpu access.  Even though in fact they
> are not in a consistent state from rendering perspective.  I don't
> really know a sane way to deal with that.  Maybe the approach instead
> should be a userspace level API (in libkms/libdrm?) to provide
> abstraction for userspace access to buffers rather than dealing with
> this at the kernel level.

It would be nice if user space had no way to block out kernel drivers,
otherwise we have to be very careful to ensure that each map() operation
can be interrupted by a signal as the last resort to avoid deadlocks.

Arnd


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Monday 05 December 2011 21:58:39 Daniel Vetter wrote:
> On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
> > ...
> 
> Thanks a lot for this excellent overview. I think at least for the first
> version of dmabuf we should drop the sync_* interfaces and simply require
> users to bracket their usage of the buffer from the attached device by
> map/unmap. A dma_buf provider is always free to cache the mapping and
> simply call sync_sg_for of the streaming dma api.

I think we still have the same problem if we allow multiple drivers
to access a noncoherent buffer using map/unmap:

driver Adriver B

1.  read/write  
2.  read/write
3.  map()   
4.  read/write
5.  dma
6.  map()
7.  dma
8.  dma
9.  unmap()
10. dma
11. read/write
12. unmap() 



In step 4, the buffer is owned by device A, but accessed by driver B, which
is a bug. In step 11, the buffer is owned by device B but accessed by driver
A, which is the same bug on the other side. In steps 7 and 8, the buffer
is owned by both device A and B, which is currently undefined but would
be ok if both devices are on the same coherency domain. Whether that point
is meaningful depends on what the devices actually do. It would be ok
if both are only reading, but not if they write into the same location
concurrently.

As I mentioned originally, the problem could be completely avoided if
we only allow consistent (e.g. uncached) mappings or buffers that
are not mapped into the kernel virtual address space at all.

Alternatively, a clearer model would be to require each access to
nonconsistent buffers to be exclusive: a map() operation would have
to block until the current mapper (if any) has done an unmap(), and
any access from the CPU would also have to call a dma_buf_ops pointer
to serialize the CPU accesses with any device accesses. User
mappings of the buffer can be easily blocked during a DMA access
by unmapping the buffer from user space at map() time and blocking the
vm_ops->fault() operation until the unmap().

> If it later turns out that we want to be able to cache the sg list also on
> the use-site in the driver (e.g. map it into some hw sg list) we can
> always add that functionality later. I just fear that sync ops among N
> devices is a bit ill-defined and we already have a plethora of ill-defined
> issues at hand. Also the proposed api doesn't quite fit into what's
> already there, I think an s/device/dma_buf_attachment/ would be more
> consistent - otherwise the dmabuf provider might need to walk the list of
> attachements to get at the right one for the device.

Right, at last for the start, let's mandate just map/unmap and not provide
sync. I do wonder however whether we should implement consistent (possibly
uncached) or streaming (cacheable, but always owned by either the device
or the CPU, not both) buffers, or who gets to make the decision which
one is used if both are implemented.

> > > The map call gets the dma_data_direction parameter, so it should be able
> > > to do the right thing. And because we keep the attachement around, any
> > > caching of mappings should be possible, too.
> > >
> > > Yours, Daniel
> > >
> > > PS: Slightly related, because it will make the coherency nightmare worse,
> > > afaict: Can we kill mmap support?
> >
> > The mmap support is required (and only make sense) for consistent mappings,
> > not for streaming mappings. The provider must ensure that if you map
> > something uncacheable into the kernel in order to provide consistency,
> > any mapping into user space must also be uncacheable. A driver
> > that wants to have the buffer mapped to user space as many do should
> > not need to know whether it is required to do cacheable or uncacheable
> > mapping because of some other driver, and it should not need to worry
> > about how to set up uncached mappings in user space.
> 
> Either I've always missed it or no one ever described it that consisely,
> but now I see the use-case for mmap: Simpler drivers (i.e. not gpus) might
> need to expose a userspace mapping and only the provider knows how to do
> that in a coherent fashion. I want this in the docs (if it's not there yet
> ...).

It's currently implemented in the ARM/PowerPC-specific dma_mmap_coherent
function and documented (hardly) in arch/arm/include/asm/dma-mapping.h.

We should make clear in that this is actually an extension of the
regular dma mapping api that first needs to be made generic.

> But even with that use-case in mind I still have some gripes with the
> current mmap interfaces as proposed:
> - This use-case explains 

[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann  wrote:
> > In the patch 2, you have a section about migration that mentions that
> > it is possible to export a buffer that can be migrated after it
> > is already mapped into one user driver. How does that work when
> > the physical addresses are mapped into a consumer device already?
>
> I think you can do physical migration if you are attached, but
> probably not if you are mapped.

Yeah, that's very much how I see this, and also why map/unmap (at least
for simple users like v4l) should only bracket actual usage. GPU memory
managers need to be able to move around buffers while no one is using
them.

[snip]

> >> + ? ? /* allow allocator to take care of cache ops */
> >> + ? ? void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> >> + ? ? void (*sync_sg_for_device)(struct dma_buf *, struct device *);
> >
> > I don't see how this works with multiple consumers: For the streaming
> > DMA mapping, there must be exactly one owner, either the device or
> > the CPU. Obviously, this rule needs to be extended when you get to
> > multiple devices and multiple device drivers, plus possibly user
> > mappings. Simply assigning the buffer to "the device" from one
> > driver does not block other drivers from touching the buffer, and
> > assigning it to "the cpu" does not stop other hardware that the
> > code calling sync_sg_for_cpu is not aware of.
> >
> > The only way to solve this that I can think of right now is to
> > mandate that the mappings are all coherent (i.e. noncachable
> > on noncoherent architectures like ARM). If you do that, you no
> > longer need the sync_sg_for_* calls.
>
> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
> ioctls and corresponding dmabuf ops, which userspace is required to
> call before / after CPU access.  Or just remove mmap() and do the
> mmap() via allocating device and use that device's equivalent
> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
> would give you a way to (a) synchronize with gpu/asynchronous
> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
> buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
> that gives you a convenient place to do cache operations on
> noncoherent architecture.
>
> I sort of preferred having the DMABUF shim because that lets you pass
> a buffer around userspace without the receiving code knowing about a
> device specific API.  But the problem I eventually came around to: if
> your GL stack (or some other userspace component) is batching up
> commands before submission to kernel, the buffers you need to wait for
> completion might not even be submitted yet.  So from kernel
> perspective they are "ready" for cpu access.  Even though in fact they
> are not in a consistent state from rendering perspective.  I don't
> really know a sane way to deal with that.  Maybe the approach instead
> should be a userspace level API (in libkms/libdrm?) to provide
> abstraction for userspace access to buffers rather than dealing with
> this at the kernel level.

Well, there's a reason GL has an explicit flush and extensions for sync
objects. It's to support such scenarios where the driver batches up gpu
commands before actually submitting them. Also, recent gpus have all (or
shortly will grow) multiple execution pipelines, so it's also important
that you sync up with the right command stream. Syncing up with all of
them is generally frowned upon for obvious reasons ;-)

So any userspace that interacts with an OpenGL driver needs to take care
of this anyway. But I think for simpler stuff (v4l) kernel only coherency
should work and userspace just needs to take care of gl interactions and
call glflush and friends at the right points. I think we can flesh this
out precisely when we spec the dmabuf EGL extension ... (or implement one
of the preexisting ones already around).

On the topic of a coherency model for dmabuf, I think we need to look at
dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
cpu_finish or whatever they might get called) as barriers:

So after a dma_buf_map, all previsously completed dma operations (i.e.
unmap already called) and any cpu writes (i.e. cpu_finish called) will be
coherent. Similar rule holds for cpu access through the userspace mmap,
only writes completed before the cpu_start will show up.

Similar, writes done by the device are only guaranteed to show up after
the _unmap. Dito for cpu writes and cpu_finish.

In short we always need two function calls to denote the start/end of the
"critical section".

Any concurrent operations are allowed to yield garbage, meaning any
combination of the old or either of the newly written contents (i.e.
non-overlapping writes might not actually all end up in the buffer,
but instead some old contents). Maybe we even need to loosen that to
the real "undefined behaviour", but atm I can't 

WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Pekka Enberg
On Mon, Dec 5, 2011 at 9:27 PM, Markus Trippelsdorf
 wrote:
>> > Yes the patch finally fixes the issue for me (tested with 120 kexec
>> > iterations).
>> > Thanks Jerome!
>>
>> Can you do a kick run on the modified patch ?
>
> This one is also OK after ~60 iterations.

Jerome, could you please include a reference to this LKML thread for
context and attribution for Markus for reporting and following up to
get the issue fixed in the changelog?

  Pekka


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
> On Monday 05 December 2011 19:55:44 Daniel Vetter wrote:
> > > The only way to solve this that I can think of right now is to
> > > mandate that the mappings are all coherent (i.e. noncachable
> > > on noncoherent architectures like ARM). If you do that, you no
> > > longer need the sync_sg_for_* calls.
> >
> > Woops, totally missed the addition of these. Can somebody explain to used
> > to rather coherent x86 what we need these for and the code-flow would look
> > like in a typical example. I was kinda assuming that devices would bracket
> > their use of a buffer with the attachment_map/unmap calls and any cache
> > coherency magic that might be needed would be somewhat transparent to
> > users of the interface?
>
> I'll describe how the respective functions work in the streaming mapping
> API (dma_map_*): You start out with a buffer that is owned by the CPU,
> i.e. the kernel can access it freely. When you call dma_map_sg or similar,
> a noncoherent device reading the buffer requires the cache to be flushed
> in order to see the data that was written by the CPU into the cache.
>
> After dma_map_sg, the device can perform both read and write accesses
> (depending on the flag to the map call), but the CPU is no longer allowed
> to read (which would allocate a cache line that may become invalid but
> remain marked as clean) or write (which would create a dirty cache line
> without writing it back) that buffer.
>
> Once the device is done, the driver calls dma_unmap_* and the buffer is
> again owned by the CPU. The device can no longer access it (in fact
> the address may be no longer be backed if there is an iommu) and the CPU
> can again read and write the buffer. On ARMv6 and higher, possibly some
> other architectures, dma_unmap_* also needs to invalidate the cache
> for the buffer, because due to speculative prefetching, there may also
> be a new clean cache line with stale data from an earlier version of
> the buffer.
>
> Since map/unmap is an expensive operation, the interface was extended
> to pass back the ownership to the CPU and back to the device while leaving
> the buffer mapped. dma_sync_sg_for_cpu invalidates the cache in the same
> way as dma_unmap_sg, so the CPU can access the buffer, and
> dma_sync_sg_for_device hands it back to the device by performing the
> same cache flush that dma_map_sg would do.
>
> You could for example do this if you want video input with a
> cacheable buffer, or in an rdma scenario with a buffer accessed
> by a remote machine.
>
> In case of software iommu (swiotlb, dmabounce), the map and sync
> functions don't do cache management but instead copy data between
> a buffer accessed by hardware and a different buffer accessed
> by the user.

Thanks a lot for this excellent overview. I think at least for the first
version of dmabuf we should drop the sync_* interfaces and simply require
users to bracket their usage of the buffer from the attached device by
map/unmap. A dma_buf provider is always free to cache the mapping and
simply call sync_sg_for of the streaming dma api.

If it later turns out that we want to be able to cache the sg list also on
the use-site in the driver (e.g. map it into some hw sg list) we can
always add that functionality later. I just fear that sync ops among N
devices is a bit ill-defined and we already have a plethora of ill-defined
issues at hand. Also the proposed api doesn't quite fit into what's
already there, I think an s/device/dma_buf_attachment/ would be more
consistent - otherwise the dmabuf provider might need to walk the list of
attachements to get at the right one for the device.

> > The map call gets the dma_data_direction parameter, so it should be able
> > to do the right thing. And because we keep the attachement around, any
> > caching of mappings should be possible, too.
> >
> > Yours, Daniel
> >
> > PS: Slightly related, because it will make the coherency nightmare worse,
> > afaict: Can we kill mmap support?
>
> The mmap support is required (and only make sense) for consistent mappings,
> not for streaming mappings. The provider must ensure that if you map
> something uncacheable into the kernel in order to provide consistency,
> any mapping into user space must also be uncacheable. A driver
> that wants to have the buffer mapped to user space as many do should
> not need to know whether it is required to do cacheable or uncacheable
> mapping because of some other driver, and it should not need to worry
> about how to set up uncached mappings in user space.

Either I've always missed it or no one ever described it that consisely,
but now I see the use-case for mmap: Simpler drivers (i.e. not gpus) might
need to expose a userspace mapping and only the provider knows how to do
that in a coherent fashion. I want this in the docs (if it's not there yet
...).

But even with that use-case in mind I still have some gripes with the
current mmap interfaces as 

[BUG] i915/intel-acpi.c: failed to get supported _DSM functions (was: [Dual-LVDS Acer Iconia laptop] i915/DRM issue: one screen stays off)

2011-12-05 Thread Baptiste Jonglez
CC-ing intel-gfx at lists.freedesktop.org (see below)

On Mon, Dec 05, 2011 at 11:00:41AM +0800, joeyli wrote:
> Add Cc. to platform-driver-x86 and linux-acpi
> 
> Hi Baptiste
> 
> ? ??2011-12-04 ? 17:07 +0100?Baptiste Jonglez ???
> > Hi,
> > 
> > I've got a lot of troubles with a dual-LVDS Acer laptop (it doesn't
> > have a keyboard, but two displays with touchscreens)
> > 
> > The Intel GPU is integrated into the Core i5-480M CPU: it's a bit
> > older than Sandybridge, as it seems to be based on the Arrandale
> > micro-architecture.
> > 
> > In the BIOS, both displays work fine; but as soon as the kernel boots
> > up, the second display (i.e. the one where you usually find a
> > keyboard) is turned off. The main display works as expected.
> > 
> > xrandr reports two LVDS displays: LVDS1, which is connected, and
> > LVDS2, which is marked as "disconnected". No matter what I tried, I
> > can't bring that second display up.
> > 
> > During the boot, just after the drm is set up, the following message
> > shows up:
> > 
> >   [drm:intel_dsm_pci_probe] *ERROR* failed to get supported _DSM functions
> > 
> > (attached is the relevant part of dmesg [1])
> > 
> > 
> 
> Have no idea for this _DSM error, need help from drm and acpi experts.

It definitely looks like an ACPI issue.
That function is defined in `drivers/gpu/drm/i915/intel_acpi.c'.
The whole file was added more than a year ago by commit 723bfd707a97
(see the relevant thread on intel-gfx@ [1]) to "add _DSM support".
One of the first comment is about "Calpella", which is exactly the
platform of my laptop (as shown by lshw)

However, I honestly don't know what is wrong with that code...
Is there anything I can provide to sort things out?

> > I then tried booting with "video=LVDS-2:e". The same message shows up
> > while booting, with these two following:
> > 
> >   [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:4]
> >   fbcon_init: detected unhandled fb_set_par error, error code -22
> > 
> > (attached is the relevant part of dmesg [2])
> > 
> > With that kernel command line forcing LVDS2, the
> > "drm_crtc_helper_set_config" error shows up each time I switch tty;
> > additionally, X does not want to start anymore (spewing out the
> > aforementioned error multiple times before giving up)
> > 
> > 
> > I'm currently using the latest 3.2 kernel from linus' tree
> > (af968e29acd91ebeb4224e899202c46c93171ecd), but the behavior was
> > similar with a vanilla 3.1.2.
> > 
> > 
> > Other notes about this issue:
> > 
> >  - with an Ubuntu 2.6.35 kernel, the second display is on but
> >flickering (with the picture distorted like an old analog TV...).
> >The main display is working fine, as always.

I just bumped on bug report #29821 on freedesktop.org [2] which dates
the "flicker" bug around 2.6.35. I guess the ubuntu kernel doesn't
have the fix (and actually, the fix might be responsible for the lack
of output on the second display with later kernels.)

I'll try with an older kernel to see what it does.

> >  - with an Archlinux 2.6.37.5 kernel, the behavior is the same as with
> >3.2, the main display is ok and the second one is off.
> > 
> >  - I did succeed, only once and out of pure luck, to get the second
> >screen to work with the 3.1.2 kernel. I haven't been able to
> >reproduce that... I had booted with "video=LVDS-2:e" and let the
> >laptop running ; pressing a key a few hours later turned back
> >*both* displays on (the main display had been turned off by DPMS,
> >and the second, well, was off from the start, as always)
> >While not very helpful, it shows that it's definitely possible.
> > 
> 
> What does Windows platform's behavior? Does there have any physical key
> that can turn on/off the second LVDS on Windows?

Actually, the first thing I did was wiping Windows out :)


[Bug 43538] libdrm-2.4.28: rbo.h is missing.

2011-12-05 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=43538

Arkadiusz Miskiewicz  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED

--- Comment #1 from Arkadiusz Miskiewicz  2011-12-05 
13:34:10 PST ---
Fixed at 902ee661f1864aaf8325621085f6a1b5a6a3673a

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Monday 05 December 2011 19:55:44 Daniel Vetter wrote:
> > The only way to solve this that I can think of right now is to
> > mandate that the mappings are all coherent (i.e. noncachable
> > on noncoherent architectures like ARM). If you do that, you no
> > longer need the sync_sg_for_* calls.
> 
> Woops, totally missed the addition of these. Can somebody explain to used
> to rather coherent x86 what we need these for and the code-flow would look
> like in a typical example. I was kinda assuming that devices would bracket
> their use of a buffer with the attachment_map/unmap calls and any cache
> coherency magic that might be needed would be somewhat transparent to
> users of the interface?

I'll describe how the respective functions work in the streaming mapping
API (dma_map_*): You start out with a buffer that is owned by the CPU,
i.e. the kernel can access it freely. When you call dma_map_sg or similar,
a noncoherent device reading the buffer requires the cache to be flushed
in order to see the data that was written by the CPU into the cache.

After dma_map_sg, the device can perform both read and write accesses
(depending on the flag to the map call), but the CPU is no longer allowed
to read (which would allocate a cache line that may become invalid but
remain marked as clean) or write (which would create a dirty cache line
without writing it back) that buffer.

Once the device is done, the driver calls dma_unmap_* and the buffer is
again owned by the CPU. The device can no longer access it (in fact
the address may be no longer be backed if there is an iommu) and the CPU
can again read and write the buffer. On ARMv6 and higher, possibly some
other architectures, dma_unmap_* also needs to invalidate the cache
for the buffer, because due to speculative prefetching, there may also
be a new clean cache line with stale data from an earlier version of
the buffer.

Since map/unmap is an expensive operation, the interface was extended
to pass back the ownership to the CPU and back to the device while leaving
the buffer mapped. dma_sync_sg_for_cpu invalidates the cache in the same
way as dma_unmap_sg, so the CPU can access the buffer, and
dma_sync_sg_for_device hands it back to the device by performing the
same cache flush that dma_map_sg would do.

You could for example do this if you want video input with a
cacheable buffer, or in an rdma scenario with a buffer accessed
by a remote machine.

In case of software iommu (swiotlb, dmabounce), the map and sync
functions don't do cache management but instead copy data between
a buffer accessed by hardware and a different buffer accessed
by the user.

> The map call gets the dma_data_direction parameter, so it should be able
> to do the right thing. And because we keep the attachement around, any
> caching of mappings should be possible, too.
> 
> Yours, Daniel
> 
> PS: Slightly related, because it will make the coherency nightmare worse,
> afaict: Can we kill mmap support?

The mmap support is required (and only make sense) for consistent mappings,
not for streaming mappings. The provider must ensure that if you map
something uncacheable into the kernel in order to provide consistency,
any mapping into user space must also be uncacheable. A driver
that wants to have the buffer mapped to user space as many do should
not need to know whether it is required to do cacheable or uncacheable
mapping because of some other driver, and it should not need to worry
about how to set up uncached mappings in user space.

Arnd


[Bug 43538] New: libdrm-2.4.28: rbo.h is missing.

2011-12-05 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=43538

 Bug #: 43538
   Summary: libdrm-2.4.28: rbo.h is missing.
Classification: Unclassified
   Product: DRI
   Version: XOrg CVS
  Platform: Other
OS/Version: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: libdrm
AssignedTo: dri-devel at lists.freedesktop.org
ReportedBy: michel.hermier at gmail.com


Hi,
It seems the tarball generation is broken and the rbo.h is missing from
tests/radeon/ resulting with an obvious gcc error:
  CC rbo.o
radeon_ttm.c:28:17: fatal error: rbo.h: No such file or directory

Thanks for your hard work.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 05:18:48PM +, Arnd Bergmann wrote:
> On Friday 02 December 2011, Sumit Semwal wrote:
> > +   /* allow allocator to take care of cache ops */
> > +   void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> > +   void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>
> I don't see how this works with multiple consumers: For the streaming
> DMA mapping, there must be exactly one owner, either the device or
> the CPU. Obviously, this rule needs to be extended when you get to
> multiple devices and multiple device drivers, plus possibly user
> mappings. Simply assigning the buffer to "the device" from one
> driver does not block other drivers from touching the buffer, and
> assigning it to "the cpu" does not stop other hardware that the
> code calling sync_sg_for_cpu is not aware of.
>
> The only way to solve this that I can think of right now is to
> mandate that the mappings are all coherent (i.e. noncachable
> on noncoherent architectures like ARM). If you do that, you no
> longer need the sync_sg_for_* calls.

Woops, totally missed the addition of these. Can somebody explain to used
to rather coherent x86 what we need these for and the code-flow would look
like in a typical example. I was kinda assuming that devices would bracket
their use of a buffer with the attachment_map/unmap calls and any cache
coherency magic that might be needed would be somewhat transparent to
users of the interface?

The map call gets the dma_data_direction parameter, so it should be able
to do the right thing. And because we keep the attachement around, any
caching of mappings should be possible, too.

Yours, Daniel

PS: Slightly related, because it will make the coherency nightmare worse,
afaict: Can we kill mmap support?
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[PATCH 2/2] drm/omap: add GEM support for tiled/dmm buffers

2011-12-05 Thread Rob Clark
From: Rob Clark 

TILER/DMM provides two features for omapdrm GEM objects:
1) providing a physically contiguous view to discontiguous memory
   for hw initiators that cannot otherwise support discontiguous
   buffers (DSS scanout, IVAHD video decode/encode, etc)
2) providing untiling for 2d tiled buffers, which are used in some
   cases to provide rotation and reduce memory bandwidth for hw
   initiators that tend to access data in 2d block patterns.

For 2d tiled buffers, there are some additional complications when
it comes to userspace mmap'ings.  For non-tiled buffers, the original
(potentially physically discontiguous) pages are used to back the
mmap.  For tiled buffers, we need to mmap via the tiler/dmm region to
provide an unswizzled view of the buffer.  But (a) the buffer is not
necessarily pinned in TILER all the time (it can be unmapped when
there is no DMA access to the buffer), and (b) when they are they
are pinned, they not necessarily page aligned from the perspective of
the CPU.  And non-page aligned userspace buffer mapping is evil.

To solve this, we reserve one or more small regions in each of the 2d
containers when the driver is loaded to use as a "user-GART" where we
can create a second page-aligned mapping of parts of the buffer being
accessed from userspace.  Page faulting is used to evict and remap
different regions of whichever buffers are being accessed from user-
space.

Signed-off-by: Rob Clark 
---
 drivers/staging/omapdrm/TODO   |5 +
 drivers/staging/omapdrm/omap_drv.c |6 +-
 drivers/staging/omapdrm/omap_drv.h |3 +
 drivers/staging/omapdrm/omap_fb.c  |2 +-
 drivers/staging/omapdrm/omap_gem.c |  432 +---
 drivers/staging/omapdrm/omap_gem_helpers.c |   55 
 6 files changed, 466 insertions(+), 37 deletions(-)

diff --git a/drivers/staging/omapdrm/TODO b/drivers/staging/omapdrm/TODO
index 18677e7..55b1837 100644
--- a/drivers/staging/omapdrm/TODO
+++ b/drivers/staging/omapdrm/TODO
@@ -22,6 +22,11 @@ TODO
 . Review DSS vs KMS mismatches.  The omap_dss_device is sort of part encoder,
   part connector.  Which results in a bit of duct tape to fwd calls from
   encoder to connector.  Possibly this could be done a bit better.
+. Solve PM sequencing on resume.  DMM/TILER must be reloaded before any
+  access is made from any component in the system.  Which means on suspend
+  CRTC's should be disabled, and on resume the LUT should be reprogrammed
+  before CRTC's are re-enabled, to prevent DSS from trying to DMA from a
+  buffer mapped in DMM/TILER before LUT is reloaded.
 . Add debugfs information for DMM/TILER

 Userspace:
diff --git a/drivers/staging/omapdrm/omap_drv.c 
b/drivers/staging/omapdrm/omap_drv.c
index 71de7cf..7ecf578 100644
--- a/drivers/staging/omapdrm/omap_drv.c
+++ b/drivers/staging/omapdrm/omap_drv.c
@@ -509,7 +509,7 @@ static int ioctl_gem_info(struct drm_device *dev, void 
*data,
return -ENOENT;
}

-   args->size = obj->size;  /* for now */
+   args->size = omap_gem_mmap_size(obj);
args->offset = omap_gem_mmap_offset(obj);

drm_gem_object_unreference_unlocked(obj);
@@ -557,6 +557,8 @@ static int dev_load(struct drm_device *dev, unsigned long 
flags)

dev->dev_private = priv;

+   omap_gem_init(dev);
+
ret = omap_modeset_init(dev);
if (ret) {
dev_err(dev->dev, "omap_modeset_init failed: ret=%d\n", ret);
@@ -589,8 +591,8 @@ static int dev_unload(struct drm_device *dev)
drm_kms_helper_poll_fini(dev);

omap_fbdev_free(dev);
-
omap_modeset_free(dev);
+   omap_gem_deinit(dev);

kfree(dev->dev_private);
dev->dev_private = NULL;
diff --git a/drivers/staging/omapdrm/omap_drv.h 
b/drivers/staging/omapdrm/omap_drv.h
index c8f2752..9d0783d 100644
--- a/drivers/staging/omapdrm/omap_drv.h
+++ b/drivers/staging/omapdrm/omap_drv.h
@@ -84,6 +84,8 @@ struct drm_connector *omap_framebuffer_get_next_connector(
 void omap_framebuffer_flush(struct drm_framebuffer *fb,
int x, int y, int w, int h);

+void omap_gem_init(struct drm_device *dev);
+void omap_gem_deinit(struct drm_device *dev);

 struct drm_gem_object *omap_gem_new(struct drm_device *dev,
union omap_gem_size gsize, uint32_t flags);
@@ -109,6 +111,7 @@ int omap_gem_get_paddr(struct drm_gem_object *obj,
dma_addr_t *paddr, bool remap);
 int omap_gem_put_paddr(struct drm_gem_object *obj);
 uint64_t omap_gem_mmap_offset(struct drm_gem_object *obj);
+size_t omap_gem_mmap_size(struct drm_gem_object *obj);

 static inline int align_pitch(int pitch, int width, int bpp)
 {
diff --git a/drivers/staging/omapdrm/omap_fb.c 
b/drivers/staging/omapdrm/omap_fb.c
index 82ed612..491be53 100644
--- a/drivers/staging/omapdrm/omap_fb.c
+++ b/drivers/staging/omapdrm/omap_fb.c
@@ -102,7 +102,7 @@ int omap_framebuffer_get_buffer(struct drm_framebuffer *fb, 
int x, int y,
  

[PATCH 1/2] drm/omap: DMM/TILER support for OMAP4+ platform

2011-12-05 Thread Rob Clark
From: Andy Gross 

Dynamic Memory Manager (DMM) is a hardware block in the OMAP4+
processor that contains at least one TILER instance.  TILER, or
Tiling and Isometric Lightweight Engine for Rotation, provides
IOMMU capabilities through the use of a physical address translation
table.  The TILER also provides zero cost rotation and mirroring.

The TILER provides both 1D and 2D access by providing different views
or address ranges that can be used to access the physical memory that
has been mapped in through the PAT.  Access to the 1D view results in
linear access to the underlying memory.  Access to the 2D views result
in tiled access to the underlying memory resulted in increased
efficiency.

The TILER address space is managed by a tiler container manager (TCM)
and allocates the address space through the use of the Simple Tiler
Allocation algorithm (SiTA).  The purpose of the algorithm is to keep
fragmentation of the address space as low as possible.

Signed-off-by: Andy Gross 
Signed-off-by: Rob Clark 
---
 drivers/staging/omapdrm/Makefile |   10 +-
 drivers/staging/omapdrm/TODO |1 +
 drivers/staging/omapdrm/omap_dmm_priv.h  |  187 
 drivers/staging/omapdrm/omap_dmm_tiler.c |  672 
 drivers/staging/omapdrm/omap_dmm_tiler.h |  130 ++
 drivers/staging/omapdrm/omap_drm.h   |2 +-
 drivers/staging/omapdrm/omap_drv.c   |   21 +-
 drivers/staging/omapdrm/omap_priv.h  |7 +-
 drivers/staging/omapdrm/tcm-sita.c   |  703 ++
 drivers/staging/omapdrm/tcm-sita.h   |   95 
 drivers/staging/omapdrm/tcm.h|  326 ++
 11 files changed, 2143 insertions(+), 11 deletions(-)
 create mode 100644 drivers/staging/omapdrm/omap_dmm_priv.h
 create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.c
 create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.h
 create mode 100644 drivers/staging/omapdrm/tcm-sita.c
 create mode 100644 drivers/staging/omapdrm/tcm-sita.h
 create mode 100644 drivers/staging/omapdrm/tcm.h

diff --git a/drivers/staging/omapdrm/Makefile b/drivers/staging/omapdrm/Makefile
index 4aa9a2f..275054a 100644
--- a/drivers/staging/omapdrm/Makefile
+++ b/drivers/staging/omapdrm/Makefile
@@ -4,7 +4,15 @@
 #

 ccflags-y := -Iinclude/drm -Werror
-omapdrm-y := omap_drv.o omap_crtc.o omap_encoder.o omap_connector.o omap_fb.o 
omap_fbdev.o omap_gem.o
+omapdrm-y := omap_drv.o \
+   omap_crtc.o \
+   omap_encoder.o \
+   omap_connector.o \
+   omap_fb.o \
+   omap_fbdev.o \
+   omap_gem.o \
+   omap_dmm_tiler.o \
+   tcm-sita.o

 # temporary:
 omapdrm-y += omap_gem_helpers.o
diff --git a/drivers/staging/omapdrm/TODO b/drivers/staging/omapdrm/TODO
index 17781c9..18677e7 100644
--- a/drivers/staging/omapdrm/TODO
+++ b/drivers/staging/omapdrm/TODO
@@ -22,6 +22,7 @@ TODO
 . Review DSS vs KMS mismatches.  The omap_dss_device is sort of part encoder,
   part connector.  Which results in a bit of duct tape to fwd calls from
   encoder to connector.  Possibly this could be done a bit better.
+. Add debugfs information for DMM/TILER

 Userspace:
 . git://github.com/robclark/xf86-video-omap.git
diff --git a/drivers/staging/omapdrm/omap_dmm_priv.h 
b/drivers/staging/omapdrm/omap_dmm_priv.h
new file mode 100644
index 000..65b990c
--- /dev/null
+++ b/drivers/staging/omapdrm/omap_dmm_priv.h
@@ -0,0 +1,187 @@
+/*
+ *
+ * Copyright (C) 2011 Texas Instruments Incorporated - http://www.ti.com/
+ * Author: Rob Clark 
+ * Andy Gross 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef OMAP_DMM_PRIV_H
+#define OMAP_DMM_PRIV_H
+
+#define DMM_REVISION  0x000
+#define DMM_HWINFO0x004
+#define DMM_LISA_HWINFO   0x008
+#define DMM_DMM_SYSCONFIG 0x010
+#define DMM_LISA_LOCK 0x01C
+#define DMM_LISA_MAP__0   0x040
+#define DMM_LISA_MAP__1   0x044
+#define DMM_TILER_HWINFO  0x208
+#define DMM_TILER_OR__0   0x220
+#define DMM_TILER_OR__1   0x224
+#define DMM_PAT_HWINFO0x408
+#define DMM_PAT_GEOMETRY  0x40C
+#define DMM_PAT_CONFIG0x410
+#define DMM_PAT_VIEW__0   0x420
+#define DMM_PAT_VIEW__1   0x424
+#define DMM_PAT_VIEW_MAP__0   0x440
+#define DMM_PAT_VIEW_MAP_BASE 0x460
+#define DMM_PAT_IRQ_EOI   0x478
+#define DMM_PAT_IRQSTATUS_RAW 0x480
+#define DMM_PAT_IRQSTATUS 0x490
+#define DMM_PAT_IRQENABLE_SET 0x4A0
+#define DMM_PAT_IRQENABLE_CLR 0x4B0
+#define DMM_PAT_STATUS__0 0x4C0
+#define DMM_PAT_STATUS__1 0x4C4
+#define DMM_PAT_STATUS__2 

[PATCH 0/2] omap/drm: dmm/tiler support for GEM buffers

2011-12-05 Thread Rob Clark
From: Rob Clark 

Support for DMM and tiled buffers.  The DMM/TILER block in omap4+ SoC
provides support for remapping physically discontiguous buffers for
various DMA initiators (DSS, IVAHD, etc) which do not otherwise support
non-physically contiguous buffers, as well as providing support for
tiled buffers.

See the descriptions in the following two patches for more details.

Andy Gross (1):
  drm/omap: DMM/TILER support for OMAP4+ platform

Rob Clark (1):
  drm/omap: add GEM support for tiled/dmm buffers

 drivers/staging/omapdrm/Makefile   |   10 +-
 drivers/staging/omapdrm/TODO   |6 +
 drivers/staging/omapdrm/omap_dmm_priv.h|  187 
 drivers/staging/omapdrm/omap_dmm_tiler.c   |  672 ++
 drivers/staging/omapdrm/omap_dmm_tiler.h   |  130 +
 drivers/staging/omapdrm/omap_drm.h |2 +-
 drivers/staging/omapdrm/omap_drv.c |   27 +-
 drivers/staging/omapdrm/omap_drv.h |3 +
 drivers/staging/omapdrm/omap_fb.c  |2 +-
 drivers/staging/omapdrm/omap_gem.c |  432 --
 drivers/staging/omapdrm/omap_gem_helpers.c |   55 +++
 drivers/staging/omapdrm/omap_priv.h|7 +-
 drivers/staging/omapdrm/tcm-sita.c |  703 
 drivers/staging/omapdrm/tcm-sita.h |   95 
 drivers/staging/omapdrm/tcm.h  |  326 +
 15 files changed, 2609 insertions(+), 48 deletions(-)
 create mode 100644 drivers/staging/omapdrm/omap_dmm_priv.h
 create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.c
 create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.h
 create mode 100644 drivers/staging/omapdrm/tcm-sita.c
 create mode 100644 drivers/staging/omapdrm/tcm-sita.h
 create mode 100644 drivers/staging/omapdrm/tcm.h

-- 
1.7.5.4



WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Markus Trippelsdorf
On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
> On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
> > On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
> > > On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
> > >  wrote:
> > > > On 2011.12.03 at 12:20 +, Dave Airlie wrote:
> > > >> >> > > > > FIX idr_layer_cache: Marking all objects used
> > > >> >> > > >
> > > >> >> > > > Yesterday I couldn't reproduce the issue at all. But today 
> > > >> >> > > > I've hit
> > > >> >> > > > exactly the same spot again. (CCing the drm list)
> > > >>
> > > >> If I had to guess it looks like 0 is getting written back to some
> > > >> random page by the GPU maybe, it could be that the GPU is in some half
> > > >> setup state at boot or on a reboot does it happen from a cold boot or
> > > >> just warm boot or kexec?
> > > >
> > > > Only happened with kexec thus far. Cold boot seems to be fine.
> > > >
> > > 
> > > Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
> > > you can reproduce.
> > 
> > No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
> > after 700 successful kexec iterations...)
> > 
> 
> Can you try if attached patch fix the issue when you don't pass the
> radeon.no_wb=1 option ?

Yes the patch finally fixes the issue for me (tested with 120 kexec
iterations).
Thanks Jerome!

-- 
Markus


[PATCH 2/2] drm/radeon: allocate semaphore from the ib pool

2011-12-05 Thread j.gli...@gmail.com
From: Jerome Glisse 

This allow to share the ib pool with semaphore and avoid
having more bo around.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h   |   67 -
 drivers/gpu/drm/radeon/radeon_device.c|2 +-
 drivers/gpu/drm/radeon/radeon_ring.c  |5 +-
 drivers/gpu/drm/radeon/radeon_semaphore.c |  157 -
 4 files changed, 131 insertions(+), 100 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8cb6a58..5e35423 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -247,32 +247,6 @@ void radeon_fence_unref(struct radeon_fence **fence);
 int radeon_fence_count_emitted(struct radeon_device *rdev, int ring);

 /*
- * Semaphores.
- */
-struct radeon_ring;
-
-struct radeon_semaphore_driver {
-   rwlock_tlock;
-   struct list_headfree;
-};
-
-struct radeon_semaphore {
-   struct radeon_bo*robj;
-   struct list_headlist;
-   uint64_tgpu_addr;
-};
-
-void radeon_semaphore_driver_fini(struct radeon_device *rdev);
-int radeon_semaphore_create(struct radeon_device *rdev,
-   struct radeon_semaphore **semaphore);
-void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
- struct radeon_semaphore *semaphore);
-void radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring,
-   struct radeon_semaphore *semaphore);
-void radeon_semaphore_free(struct radeon_device *rdev,
-  struct radeon_semaphore *semaphore);
-
-/*
  * Tiling registers
  */
 struct radeon_surface_reg {
@@ -410,6 +384,46 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 uint32_t handle);

 /*
+ * Semaphores.
+ */
+struct radeon_ring;
+
+#defineRADEON_SEMAPHORE_BO_SIZE256
+
+struct radeon_semaphore_driver {
+   rwlock_tlock;
+   struct list_headbo;
+};
+
+struct radeon_semaphore_bo;
+
+/* everything here is constant */
+struct radeon_semaphore {
+   struct list_headlist;
+   uint64_tgpu_addr;
+   uint32_t*cpu_ptr;
+   struct radeon_semaphore_bo  *bo;
+};
+
+struct radeon_semaphore_bo {
+   struct list_headlist;
+   struct radeon_ib*ib;
+   struct list_headfree;
+   struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8];
+   unsignednused;
+};
+
+void radeon_semaphore_driver_fini(struct radeon_device *rdev);
+int radeon_semaphore_create(struct radeon_device *rdev,
+   struct radeon_semaphore **semaphore);
+void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
+ struct radeon_semaphore *semaphore);
+void radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring,
+   struct radeon_semaphore *semaphore);
+void radeon_semaphore_free(struct radeon_device *rdev,
+  struct radeon_semaphore *semaphore);
+
+/*
  * GART structures, functions & helpers
  */
 struct radeon_mc;
@@ -716,6 +730,7 @@ void r600_blit_suspend(struct radeon_device *rdev);
 int radeon_ib_get(struct radeon_device *rdev, int ring,
  struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
+bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
 void radeon_ib_pool_fini(struct radeon_device *rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 6566860..aa9a11e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -731,7 +731,7 @@ int radeon_device_init(struct radeon_device *rdev,
INIT_LIST_HEAD(>gem.objects);
init_waitqueue_head(>irq.vblank_queue);
init_waitqueue_head(>irq.idle_queue);
-   INIT_LIST_HEAD(>semaphore_drv.free);
+   INIT_LIST_HEAD(>semaphore_drv.bo);
/* initialize vm here */
rdev->vm_manager.use_bitmap = 1;
rdev->vm_manager.max_pfn = 1 << 20;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 5f9edea..4fe320f 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -77,8 +77,7 @@ void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
 /*
  * IB.
  */
-static bool radeon_ib_try_free(struct radeon_device *rdev,
-  struct radeon_ib *ib)
+bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib 

[PATCH 1/2] drm/radeon: make ib size variable

2011-12-05 Thread j.gli...@gmail.com
From: Jerome Glisse 

This avoid to waste ib pool size and avoid a bunch of wait for
previous ib to finish.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/r100.c  |2 +-
 drivers/gpu/drm/radeon/r600.c  |2 +-
 drivers/gpu/drm/radeon/r600_blit_kms.c |   16 +---
 drivers/gpu/drm/radeon/radeon.h|3 ++-
 drivers/gpu/drm/radeon/radeon_cs.c |6 --
 drivers/gpu/drm/radeon/radeon_ring.c   |7 +--
 6 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 657040b..947ba22 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -3708,7 +3708,7 @@ int r100_ib_test(struct radeon_device *rdev)
return r;
}
WREG32(scratch, 0xCAFEDEAD);
-   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, );
+   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, , 256);
if (r) {
return r;
}
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 951566f..4f08e5e 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2711,7 +2711,7 @@ int r600_ib_test(struct radeon_device *rdev, int ring)
return r;
}
WREG32(scratch, 0xCAFEDEAD);
-   r = radeon_ib_get(rdev, ring, );
+   r = radeon_ib_get(rdev, ring, , 256);
if (r) {
DRM_ERROR("radeon: failed to get ib (%d).\n", r);
return r;
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 02a7574..d996f43 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -619,16 +619,17 @@ void r600_blit_fini(struct radeon_device *rdev)
radeon_bo_unref(>r600_blit.shader_obj);
 }

-static int r600_vb_ib_get(struct radeon_device *rdev)
+static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size)
 {
int r;
-   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, 
>r600_blit.vb_ib);
+   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX,
+ >r600_blit.vb_ib, size);
if (r) {
DRM_ERROR("failed to get IB for vertex buffer\n");
return r;
}

-   rdev->r600_blit.vb_total = 64*1024;
+   rdev->r600_blit.vb_total = size;
rdev->r600_blit.vb_used = 0;
return 0;
 }
@@ -693,10 +694,6 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_gpu_pages)
int num_loops = 0;
int dwords_per_loop = rdev->r600_blit.ring_size_per_loop;

-   r = r600_vb_ib_get(rdev);
-   if (r)
-   return r;
-
/* num loops */
while (num_gpu_pages) {
num_gpu_pages -=
@@ -705,6 +702,11 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_gpu_pages)
num_loops++;
}

+   /* 48 bytes for vertex per loop */
+   r = r600_vb_ib_get(rdev, (num_loops*48)+256);
+   if (r)
+   return r;
+
/* calculate number of loops correctly */
ring_size = num_loops * dwords_per_loop;
ring_size += rdev->r600_blit.ring_size_common;
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6673f19..8cb6a58 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -713,7 +713,8 @@ struct r600_blit {

 void r600_blit_suspend(struct radeon_device *rdev);

-int radeon_ib_get(struct radeon_device *rdev, int ring, struct radeon_ib **ib);
+int radeon_ib_get(struct radeon_device *rdev, int ring,
+ struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index b3bbf37..fdfc31b 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -288,7 +288,8 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 * input memory (cached) and write to the IB (which can be
 * uncached).
 */
-   r =  radeon_ib_get(rdev, parser->ring, >ib);
+   r =  radeon_ib_get(rdev, parser->ring, >ib,
+  ib_chunk->length_dw * 4);
if (r) {
DRM_ERROR("Failed to get ib !\n");
return r;
@@ -348,7 +349,8 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
DRM_ERROR("cs IB too big: %d\n", ib_chunk->length_dw);
return -EINVAL;
}
-   r =  radeon_ib_get(rdev, parser->ring, >ib);
+   r =  radeon_ib_get(rdev, parser->ring, >ib,
+  ib_chunk->length_dw * 4);
if (r) {

Make ib allocation size function of cs size

2011-12-05 Thread j.gli...@gmail.com
Two following patch are on top of
http://cgit.freedesktop.org/~glisse/linux

They make the ib allocation size a function of the cs size, this
allow to avoid wasting pool space and avoid to trigger fence_wait
in ib_get. I am still evaluating how much fence_wait we avoid
with this.

Cheers,
Jerome



[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Friday 02 December 2011, Sumit Semwal wrote:
> This is the first step in defining a dma buffer sharing mechanism.

This looks very nice, but there are a few things I don't understand yet
and a bunch of trivial comments I have about things I spotted.

Do you have prototype exporter and consumer drivers that you can post
for clarification?

In the patch 2, you have a section about migration that mentions that
it is possible to export a buffer that can be migrated after it
is already mapped into one user driver. How does that work when
the physical addresses are mapped into a consumer device already?

> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..07d8095 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>  
>  source "drivers/base/regmap/Kconfig"
>  
> +config DMA_SHARED_BUFFER
> + bool "Buffer framework to be shared between drivers"
> + default n
> + depends on ANON_INODES

I would make this 'select ANON_INODES', like the other users of this
feature.

> + return dmabuf;
> +}
> +EXPORT_SYMBOL(dma_buf_export);

I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
because it's really a low-level function that I would expect
to get used by in-kernel subsystems providing the feature to
users and having back-end drivers, but it's not the kind of thing
we want out-of-tree drivers to mess with.

> +/**
> + * dma_buf_fd - returns a file descriptor for the given dma_buf
> + * @dmabuf:  [in]pointer to dma_buf for which fd is required.
> + *
> + * On success, returns an associated 'fd'. Else, returns error.
> + */
> +int dma_buf_fd(struct dma_buf *dmabuf)
> +{
> + int error, fd;
> +
> + if (!dmabuf->file)
> + return -EINVAL;
> +
> + error = get_unused_fd_flags(0);

Why not simply get_unused_fd()?

> +/**
> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
> + * calls attach() of dma_buf_ops to allow device-specific attach 
> functionality
> + * @dmabuf:  [in]buffer to attach device to.
> + * @dev: [in]device to be attached.
> + *
> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
> + *

Or may return a negative error code. It's better to be consistent here:
either always return NULL on error, or change the allocation error to
ERR_PTR(-ENOMEM).

> + */
> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> + struct device *dev)
> +{
> + struct dma_buf_attachment *attach;
> + int ret;
> +
> + BUG_ON(!dmabuf || !dev);
> +
> + attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
> + if (attach == NULL)
> + goto err_alloc;
> +
> + mutex_lock(>lock);
> +
> + attach->dev = dev;
> + attach->dmabuf = dmabuf;
> + if (dmabuf->ops->attach) {
> + ret = dmabuf->ops->attach(dmabuf, dev, attach);
> + if (!ret)
> + goto err_attach;

You probably mean "if (ret)" here instead of "if (!ret)", right?

> + /* allow allocator to take care of cache ops */
> + void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> + void (*sync_sg_for_device)(struct dma_buf *, struct device *);

I don't see how this works with multiple consumers: For the streaming
DMA mapping, there must be exactly one owner, either the device or
the CPU. Obviously, this rule needs to be extended when you get to
multiple devices and multiple device drivers, plus possibly user
mappings. Simply assigning the buffer to "the device" from one
driver does not block other drivers from touching the buffer, and
assigning it to "the cpu" does not stop other hardware that the
code calling sync_sg_for_cpu is not aware of.

The only way to solve this that I can think of right now is to
mandate that the mappings are all coherent (i.e. noncachable
on noncoherent architectures like ARM). If you do that, you no
longer need the sync_sg_for_* calls.

> +#ifdef CONFIG_DMA_SHARED_BUFFER

Do you have a use case for making the interface compile-time disabled?
I had assumed that any code using it would make no sense if it's not
available so you don't actually need this.

Arnd


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 4:09 PM, Arnd Bergmann  wrote:
> On Monday 05 December 2011 14:46:47 Rob Clark wrote:
>> I sort of preferred having the DMABUF shim because that lets you pass
>> a buffer around userspace without the receiving code knowing about a
>> device specific API. ?But the problem I eventually came around to: if
>> your GL stack (or some other userspace component) is batching up
>> commands before submission to kernel, the buffers you need to wait for
>> completion might not even be submitted yet. ?So from kernel
>> perspective they are "ready" for cpu access. ?Even though in fact they
>> are not in a consistent state from rendering perspective. ?I don't
>> really know a sane way to deal with that. ?Maybe the approach instead
>> should be a userspace level API (in libkms/libdrm?) to provide
>> abstraction for userspace access to buffers rather than dealing with
>> this at the kernel level.
>
> It would be nice if user space had no way to block out kernel drivers,
> otherwise we have to be very careful to ensure that each map() operation
> can be interrupted by a signal as the last resort to avoid deadlocks.

map_dma_buf should be documented to be allowed to return -EINTR..
otherwise, yeah, that would be problematic.

> ? ? ? ?Arnd
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 4:09 PM, Arnd Bergmann  wrote:
>>
>> https://github.com/robclark/kernel-omap4/commits/dmabuf
>
> Ok, thanks. I think it would be good to post these for reference
> in v3, with a clear indication that they are not being submitted
> for discussion/inclusion yet.

btw, don't look at this too closely at that tree yet.. where the
attach/detach is done in videobuf2 code isn't really correct.  But I
was going to get something functioning first.

BR,
-R


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter  wrote:
> On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
>> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann  wrote:
>> > In the patch 2, you have a section about migration that mentions that
>> > it is possible to export a buffer that can be migrated after it
>> > is already mapped into one user driver. How does that work when
>> > the physical addresses are mapped into a consumer device already?
>>
>> I think you can do physical migration if you are attached, but
>> probably not if you are mapped.
>
> Yeah, that's very much how I see this, and also why map/unmap (at least
> for simple users like v4l) should only bracket actual usage. GPU memory
> managers need to be able to move around buffers while no one is using
> them.
>
> [snip]
>
>> >> + ? ? /* allow allocator to take care of cache ops */
>> >> + ? ? void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
>> >> + ? ? void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>> >
>> > I don't see how this works with multiple consumers: For the streaming
>> > DMA mapping, there must be exactly one owner, either the device or
>> > the CPU. Obviously, this rule needs to be extended when you get to
>> > multiple devices and multiple device drivers, plus possibly user
>> > mappings. Simply assigning the buffer to "the device" from one
>> > driver does not block other drivers from touching the buffer, and
>> > assigning it to "the cpu" does not stop other hardware that the
>> > code calling sync_sg_for_cpu is not aware of.
>> >
>> > The only way to solve this that I can think of right now is to
>> > mandate that the mappings are all coherent (i.e. noncachable
>> > on noncoherent architectures like ARM). If you do that, you no
>> > longer need the sync_sg_for_* calls.
>>
>> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
>> ioctls and corresponding dmabuf ops, which userspace is required to
>> call before / after CPU access. ?Or just remove mmap() and do the
>> mmap() via allocating device and use that device's equivalent
>> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls. ?That
>> would give you a way to (a) synchronize with gpu/asynchronous
>> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
>> buffer (ie. wait all devices have dma_buf_unmap_attachment'd). ?And
>> that gives you a convenient place to do cache operations on
>> noncoherent architecture.
>>
>> I sort of preferred having the DMABUF shim because that lets you pass
>> a buffer around userspace without the receiving code knowing about a
>> device specific API. ?But the problem I eventually came around to: if
>> your GL stack (or some other userspace component) is batching up
>> commands before submission to kernel, the buffers you need to wait for
>> completion might not even be submitted yet. ?So from kernel
>> perspective they are "ready" for cpu access. ?Even though in fact they
>> are not in a consistent state from rendering perspective. ?I don't
>> really know a sane way to deal with that. ?Maybe the approach instead
>> should be a userspace level API (in libkms/libdrm?) to provide
>> abstraction for userspace access to buffers rather than dealing with
>> this at the kernel level.
>
> Well, there's a reason GL has an explicit flush and extensions for sync
> objects. It's to support such scenarios where the driver batches up gpu
> commands before actually submitting them.

Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
(Or something that I haven't thought of.)

> Also, recent gpus have all (or
> shortly will grow) multiple execution pipelines, so it's also important
> that you sync up with the right command stream. Syncing up with all of
> them is generally frowned upon for obvious reasons ;-)

Well, I guess I am happy enough with something that is at least
functional.  Usespace access would (I think) mainly be weird edge case
type stuff.  But...

> So any userspace that interacts with an OpenGL driver needs to take care
> of this anyway. But I think for simpler stuff (v4l) kernel only coherency
> should work and userspace just needs to take care of gl interactions and
> call glflush and friends at the right points. I think we can flesh this
> out precisely when we spec the dmabuf EGL extension ... (or implement one
> of the preexisting ones already around).

.. yeah, I think egl/eglImage extension would be the right place to
hide this behind.  And I guess your GL stack should be able to figure
out which execution pipeline to sync, cache state of buffer, and
whatever other optimizations you might want to make.

> On the topic of a coherency model for dmabuf, I think we need to look at
> dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
> cpu_finish or whatever they might get called) as barriers:
>
> So after a dma_buf_map, all previsously completed dma operations (i.e.
> unmap already called) and any cpu writes (i.e. cpu_finish 

WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 05, 2011 at 10:10:34PM +0200, Pekka Enberg wrote:
> On Mon, Dec 5, 2011 at 9:27 PM, Markus Trippelsdorf
>  wrote:
> >> > Yes the patch finally fixes the issue for me (tested with 120 kexec
> >> > iterations).
> >> > Thanks Jerome!
> >>
> >> Can you do a kick run on the modified patch ?
> >
> > This one is also OK after ~60 iterations.
> 
> Jerome, could you please include a reference to this LKML thread for
> context and attribution for Markus for reporting and following up to
> get the issue fixed in the changelog?
> 
>   Pekka

Attached updated patch, only changelog is different. Thanks Markus for
testing this.

Cheers,
Jerome


[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Semwal, Sumit
Hi Konrad,

On Fri, Dec 2, 2011 at 10:41 PM, Konrad Rzeszutek Wilk
 wrote:
> On Fri, Dec 02, 2011 at 02:27:31PM +0530, Sumit Semwal wrote:
>> This is the first step in defining a dma buffer sharing mechanism.
>>

>>
>> [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
>> [2]: http://lwn.net/Articles/454389
>>
>> Signed-off-by: Sumit Semwal 
>> Signed-off-by: Sumit Semwal 
>
> You have a clone? You only need one SOB.
:) Thanks for your review - Well, not a clone, but I have two 'employers' :))

I have a rather weird reason for this - I am employed with Texas
Instruments, but working with Linaro as well. And due to some
'non-technical' reasons, I need to send this work from @ti.com mail
ID. At the same time, I would like to acknowledge that this work was
done as part of the Linaro umbrella, so I put another SOB @linaro.org.

>
>

>> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
>> + * Author: Sumit Semwal 
>
> OK, so the SOB should be from @ti.com then.
>
>> + *

>> +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> + ? ? struct dma_buf *dmabuf;
>> +
>> + ? ? if (!is_dma_buf_file(file))
>> + ? ? ? ? ? ? return -EINVAL;
>> +
>> + ? ? dmabuf = file->private_data;
>> +
>
> Should you check if dmabuf is NULL and or dmabuf->ops is NULL too?
>
> Hm, you probably don't need to check for dmabuf, but from
> looking at ?dma_buf_export one could pass ?a NULL for the ops.
see next comment
>
>> + ? ? if (!dmabuf->ops->mmap)
>> + ? ? ? ? ? ? return -EINVAL;
>> +
>> + ? ? return dmabuf->ops->mmap(dmabuf, vma);
>> +}
>> +
>> +static int dma_buf_release(struct inode *inode, struct file *file)
>> +{
>> + ? ? struct dma_buf *dmabuf;
>> +
>> + ? ? if (!is_dma_buf_file(file))
>> + ? ? ? ? ? ? return -EINVAL;
>> +
>> + ? ? dmabuf = file->private_data;
>> +
>
> No checking here for ops or ops->release?
Hmmm.. you're right, of course. for this common check in mmap and
release, I guess I'd add it to 'is_dma_buf_file()' helper [maybe call
it is_valid_dma_buf_file() or something similar]
>

>> +
>> +/**
>
> I don't think the ** is anymore the current kernel doc format.
thanks for catching this :) - will correct.
>
>> + * dma_buf_export - Creates a new dma_buf, and associates an anon file
>> + * with this buffer,so it can be exported.
>
> Put a space there.
ok
>
>> + * Also connect the allocator specific data and ops to the buffer.
>> + *
>> + * @priv: ? ?[in] ? ?Attach private data of allocator to this buffer
>> + * @ops: ? ? [in] ? ?Attach allocator-defined dma buf ops to the new buffer.
>> + * @flags: ? [in] ? ?mode flags for the file.
>> + *
>> + * Returns, on success, a newly created dma_buf object, which wraps the
>> + * supplied private data and operations for dma_buf_ops. On failure to
>> + * allocate the dma_buf object, it can return NULL.
>
> "it can" I think the right word is "it will".
Right.
>
>> + *
>> + */
>> +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? int flags)
>> +{
>> + ? ? struct dma_buf *dmabuf;
>> + ? ? struct file *file;
>> +
>> + ? ? BUG_ON(!priv || !ops);
>
> Whoa. Crash the whole kernel b/c of this? No no. You should
> use WARN_ON and just return NULL.
ok
>
>> +
>> + ? ? dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
>> + ? ? if (dmabuf == NULL)
>> + ? ? ? ? ? ? return dmabuf;
>
> Hmm, why not return ERR_PTR(-ENOMEM); ?
ok
>
>> +
>> + ? ? dmabuf->priv = priv;
>> + ? ? dmabuf->ops = ops;
>> +
>> + ? ? file = anon_inode_getfile("dmabuf", _buf_fops, dmabuf, flags);
>> +
>> + ? ? dmabuf->file = file;
>> +
>> + ? ? mutex_init(>lock);
>> + ? ? INIT_LIST_HEAD(>attachments);
>> +
>> + ? ? return dmabuf;
>> +}
>> +EXPORT_SYMBOL(dma_buf_export);
>
> _GPL ?
sure; will change it.
>
>> +
>> +
>> +/**
>> + * dma_buf_fd - returns a file descriptor for the given dma_buf
>> + * @dmabuf: ?[in] ? ?pointer to dma_buf for which fd is required.
>> + *
>> + * On success, returns an associated 'fd'. Else, returns error.
>> + */
>> +int dma_buf_fd(struct dma_buf *dmabuf)
>> +{
>> + ? ? int error, fd;
>> +
>
> Should you check if dmabuf is NULL first?
yes.
>
>> + ? ? if (!dmabuf->file)
>> + ? ? ? ? ? ? return -EINVAL;
>> +
>> + ? ? error = get_unused_fd_flags(0);
>> + ? ? if (error < 0)
>> + ? ? ? ? ? ? return error;
>> + ? ? fd = error;
>> +
>> + ? ? fd_install(fd, dmabuf->file);
>> +
>> + ? ? return fd;
>> +}
>> +EXPORT_SYMBOL(dma_buf_fd);
>
> GPL?
sure; will change it.
>> +
>> +/**
>> + * dma_buf_get - returns the dma_buf structure related to an fd
>> + * @fd: ? ? ?[in] ? ?fd associated with the dma_buf to be returned
>> + *
>> + * On success, returns the dma_buf structure associated with an fd; uses
>> + * file's refcounting done by fget to increase refcount. returns ERR_PTR
>> + * otherwise.
>> + */
>> +struct dma_buf *dma_buf_get(int fd)
>> +{
>> + ? ? struct file *file;
>> +
>> + ? ? file = fget(fd);
>> +
>> + ? ? if (!file)
>> + ? ? ? ? ? ? return ERR_PTR(-EBADF);
>> +
>> + ? ? if (!is_dma_buf_file(file)) {
>> + 

[RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann  wrote:
> On Friday 02 December 2011, Sumit Semwal wrote:
>> This is the first step in defining a dma buffer sharing mechanism.
>
> This looks very nice, but there are a few things I don't understand yet
> and a bunch of trivial comments I have about things I spotted.
>
> Do you have prototype exporter and consumer drivers that you can post
> for clarification?

There is some dummy drivers based on an earlier version.  And airlied
has a prime (multi-gpu) prototype:

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

I've got a nearly working camera+display prototype:

https://github.com/robclark/kernel-omap4/commits/dmabuf

> In the patch 2, you have a section about migration that mentions that
> it is possible to export a buffer that can be migrated after it
> is already mapped into one user driver. How does that work when
> the physical addresses are mapped into a consumer device already?

I think you can do physical migration if you are attached, but
probably not if you are mapped.

>> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
>> index 21cf46f..07d8095 100644
>> --- a/drivers/base/Kconfig
>> +++ b/drivers/base/Kconfig
>> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>>
>> ?source "drivers/base/regmap/Kconfig"
>>
>> +config DMA_SHARED_BUFFER
>> + ? ? bool "Buffer framework to be shared between drivers"
>> + ? ? default n
>> + ? ? depends on ANON_INODES
>
> I would make this 'select ANON_INODES', like the other users of this
> feature.
>
>> + ? ? return dmabuf;
>> +}
>> +EXPORT_SYMBOL(dma_buf_export);
>
> I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
> because it's really a low-level function that I would expect
> to get used by in-kernel subsystems providing the feature to
> users and having back-end drivers, but it's not the kind of thing
> we want out-of-tree drivers to mess with.
>
>> +/**
>> + * dma_buf_fd - returns a file descriptor for the given dma_buf
>> + * @dmabuf: ?[in] ? ?pointer to dma_buf for which fd is required.
>> + *
>> + * On success, returns an associated 'fd'. Else, returns error.
>> + */
>> +int dma_buf_fd(struct dma_buf *dmabuf)
>> +{
>> + ? ? int error, fd;
>> +
>> + ? ? if (!dmabuf->file)
>> + ? ? ? ? ? ? return -EINVAL;
>> +
>> + ? ? error = get_unused_fd_flags(0);
>
> Why not simply get_unused_fd()?
>
>> +/**
>> + * dma_buf_attach - Add the device to dma_buf's attachments list; 
>> optionally,
>> + * calls attach() of dma_buf_ops to allow device-specific attach 
>> functionality
>> + * @dmabuf: ?[in] ? ?buffer to attach device to.
>> + * @dev: ? ? [in] ? ?device to be attached.
>> + *
>> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
>> + *
>
> Or may return a negative error code. It's better to be consistent here:
> either always return NULL on error, or change the allocation error to
> ERR_PTR(-ENOMEM).
>
>> + */
>> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct device *dev)
>> +{
>> + ? ? struct dma_buf_attachment *attach;
>> + ? ? int ret;
>> +
>> + ? ? BUG_ON(!dmabuf || !dev);
>> +
>> + ? ? attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
>> + ? ? if (attach == NULL)
>> + ? ? ? ? ? ? goto err_alloc;
>> +
>> + ? ? mutex_lock(>lock);
>> +
>> + ? ? attach->dev = dev;
>> + ? ? attach->dmabuf = dmabuf;
>> + ? ? if (dmabuf->ops->attach) {
>> + ? ? ? ? ? ? ret = dmabuf->ops->attach(dmabuf, dev, attach);
>> + ? ? ? ? ? ? if (!ret)
>> + ? ? ? ? ? ? ? ? ? ? goto err_attach;
>
> You probably mean "if (ret)" here instead of "if (!ret)", right?
>
>> + ? ? /* allow allocator to take care of cache ops */
>> + ? ? void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
>> + ? ? void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>
> I don't see how this works with multiple consumers: For the streaming
> DMA mapping, there must be exactly one owner, either the device or
> the CPU. Obviously, this rule needs to be extended when you get to
> multiple devices and multiple device drivers, plus possibly user
> mappings. Simply assigning the buffer to "the device" from one
> driver does not block other drivers from touching the buffer, and
> assigning it to "the cpu" does not stop other hardware that the
> code calling sync_sg_for_cpu is not aware of.
>
> The only way to solve this that I can think of right now is to
> mandate that the mappings are all coherent (i.e. noncachable
> on noncoherent architectures like ARM). If you do that, you no
> longer need the sync_sg_for_* calls.

My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
ioctls and corresponding dmabuf ops, which userspace is required to
call before / after CPU access.  Or just remove mmap() and do the
mmap() via allocating device and use that device's equivalent
DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
would give you a way to (a) synchronize with 

drm: Branch 'master' - 2 commits

2011-12-05 Thread Eric Anholt
On Mon,  5 Dec 2011 02:31:58 -0800 (PST), ickle at kemper.freedesktop.org 
(Chris Wilson) wrote:
>  configure.ac |2 +-
>  intel/intel_bufmgr_gem.c |   27 +--
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 
> New commits:
> commit e73161a02b604742e3da3bca8f13cff81276de43
> Author: Chris Wilson 
> Date:   Mon Dec 5 10:30:52 2011 +
> 
> configure: Bump version to 2.4.28
> 
> So that we can pull a couple of Intel bug fixes into xf86-video-intel.
> 
> Signed-off-by: Chris Wilson 

Performance before:
[  0]   glfirefox-talos-gfx   17.866   17.915   0.14%4/4
after:
[  0]   glfirefox-talos-gfx   22.173   22.251   0.20%4/4

There's a pretty obvious opportunity to keep the performance win of the
userspace caching that you've broken here, but you gave none of us a
chance to review it before you pushed the patch *and shipped a release
with it*.  This is not acceptable.  Please revert and bump the release,
and fix it correctly.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20111205/90a95661/attachment-0001.pgp>
-- next part --
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
-- next part --
--
___
Dri-devel mailing list
Dri-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 05, 2011 at 07:15:49PM +0100, Markus Trippelsdorf wrote:
> On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
> > On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
> > > On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
> > > > On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
> > > >  wrote:
> > > > > On 2011.12.03 at 12:20 +, Dave Airlie wrote:
> > > > >> >> > > > > FIX idr_layer_cache: Marking all objects used
> > > > >> >> > > >
> > > > >> >> > > > Yesterday I couldn't reproduce the issue at all. But today 
> > > > >> >> > > > I've hit
> > > > >> >> > > > exactly the same spot again. (CCing the drm list)
> > > > >>
> > > > >> If I had to guess it looks like 0 is getting written back to some
> > > > >> random page by the GPU maybe, it could be that the GPU is in some 
> > > > >> half
> > > > >> setup state at boot or on a reboot does it happen from a cold boot or
> > > > >> just warm boot or kexec?
> > > > >
> > > > > Only happened with kexec thus far. Cold boot seems to be fine.
> > > > >
> > > > 
> > > > Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
> > > > you can reproduce.
> > > 
> > > No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
> > > after 700 successful kexec iterations...)
> > > 
> > 
> > Can you try if attached patch fix the issue when you don't pass the
> > radeon.no_wb=1 option ?
> 
> Yes the patch finally fixes the issue for me (tested with 120 kexec
> iterations).
> Thanks Jerome!
> 
> -- 
> Markus

Can you do a kick run on the modified patch ?

I believe this patch could go to stable too as it's low
impact from my pov.

Cheers,
Jerome


WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 5, 2011 at 1:15 PM, Markus Trippelsdorf
 wrote:
> On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
>> On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
>> > On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
>> > > On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
>> > >  wrote:
>> > > > On 2011.12.03 at 12:20 +, Dave Airlie wrote:
>> > > >> >> > > > > FIX idr_layer_cache: Marking all objects used
>> > > >> >> > > >
>> > > >> >> > > > Yesterday I couldn't reproduce the issue at all. But today 
>> > > >> >> > > > I've hit
>> > > >> >> > > > exactly the same spot again. (CCing the drm list)
>> > > >>
>> > > >> If I had to guess it looks like 0 is getting written back to some
>> > > >> random page by the GPU maybe, it could be that the GPU is in some half
>> > > >> setup state at boot or on a reboot does it happen from a cold boot or
>> > > >> just warm boot or kexec?
>> > > >
>> > > > Only happened with kexec thus far. Cold boot seems to be fine.
>> > > >
>> > >
>> > > Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
>> > > you can reproduce.
>> >
>> > No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
>> > after 700 successful kexec iterations...)
>> >
>>
>> Can you try if attached patch fix the issue when you don't pass the
>> radeon.no_wb=1 option ?
>
> Yes the patch finally fixes the issue for me (tested with 120 kexec
> iterations).
> Thanks Jerome!
>
> --
> Markus

Will respin with some minor code changes.

Cheers,
Jerome


WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
> On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
> > On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
> >  wrote:
> > > On 2011.12.03 at 12:20 +, Dave Airlie wrote:
> > >> >> > > > > FIX idr_layer_cache: Marking all objects used
> > >> >> > > >
> > >> >> > > > Yesterday I couldn't reproduce the issue at all. But today I've 
> > >> >> > > > hit
> > >> >> > > > exactly the same spot again. (CCing the drm list)
> > >>
> > >> If I had to guess it looks like 0 is getting written back to some
> > >> random page by the GPU maybe, it could be that the GPU is in some half
> > >> setup state at boot or on a reboot does it happen from a cold boot or
> > >> just warm boot or kexec?
> > >
> > > Only happened with kexec thus far. Cold boot seems to be fine.
> > >
> > > --
> > > Markus
> > 
> > Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
> > you can reproduce.
> 
> No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
> after 700 successful kexec iterations...)
> 
> -- 
> Markus

Can you try if attached patch fix the issue when you don't pass the
radeon.no_wb=1 option ?

Cheers,
Jerome


[PATCH] drm/radeon: disable possible GPU writeback early

2011-12-05 Thread Jerome Glisse
Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c |7 ++
 drivers/gpu/drm/radeon/ni.c|9 +++
 drivers/gpu/drm/radeon/nid.h   |   19 
 drivers/gpu/drm/radeon/r100.c  |7 ++
 drivers/gpu/drm/radeon/r300.c  |7 ++
 drivers/gpu/drm/radeon/r300d.h |   21 ++
 drivers/gpu/drm/radeon/r420.c  |7 ++
 drivers/gpu/drm/radeon/r420d.h |   42 
 drivers/gpu/drm/radeon/r520.c  |   10 
 drivers/gpu/drm/radeon/r520d.h |   24 
 drivers/gpu/drm/radeon/r600.c  |7 ++
 drivers/gpu/drm/radeon/rs400.c |7 ++
 drivers/gpu/drm/radeon/rs400d.h|   21 ++
 drivers/gpu/drm/radeon/rs600.c |   10 
 drivers/gpu/drm/radeon/rs600d.h|   21 ++
 drivers/gpu/drm/radeon/rs690.c |   10 
 drivers/gpu/drm/radeon/rs690d.h|   24 
 drivers/gpu/drm/radeon/rv515.c |   10 
 drivers/gpu/drm/radeon/rv515d.h|   24 
 drivers/gpu/drm/radeon/rv770.c |7 ++
 drivers/gpu/drm/radeon/rv770d.h|   20 +
 21 files changed, 314 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..d49596b 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,13 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;

+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB_CNTL, RB_NO_UPDATE);
+
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..2a00ad1 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1577,6 +1577,15 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = >ring[RADEON_RING_TYPE_GFX_INDEX];
int r;

+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1 << 0)

+#define IH_RB_CNTL0x3e00
+#   define IH_RB_ENABLE   (1 << 0)
+#   define IH_IB_SIZE(x)  ((x) << 1) /* log2 */
+#   define IH_RB_FULL_DRAIN_ENABLE(1 << 6)
+#   define IH_WPTR_WRITEBACK_ENABLE   (1 << 8)
+#   define IH_WPTR_WRITEBACK_TIMER(x) ((x) << 9) /* log2 */
+#   define IH_WPTR_OVERFLOW_ENABLE(1 << 16)
+#   define IH_WPTR_OVERFLOW_CLEAR (1 << 31)
+#define IH_CNTL   0x3e18
+#   define ENABLE_INTR(1 << 0)
+#   define IH_MC_SWAP(x)  ((x) << 1)
+#   define IH_MC_SWAP_NONE0
+#   define IH_MC_SWAP_16BIT   1
+#   define IH_MC_SWAP_32BIT   2
+#   define IH_MC_SWAP_64BIT   3
+#   define RPTR_REARM (1 << 4)
+#   define MC_WRREQ_CREDIT(x) ((x) << 15)
+#   define MC_WR_CLEAN_CNT(x) ((x) << 20)
+
 #defineCC_SYS_RB_BACKEND_DISABLE   0x3F88
 #defineGC_USER_SYS_RB_BACKEND_DISABLE  0x3F8C
 #defineCGTS_SYS_TCC_DISABLE0x3F90
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 657040b..8a71502 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -4010,6 +4010,13 @@ int r100_init(struct radeon_device *rdev)
 {
int r;

+   /* stop possible GPU activities */
+   WREG32(RADEON_CP_CSQ_MODE, 0);
+   WREG32(RADEON_CP_CSQ_CNTL, 0);
+   WREG32(R_000770_SCRATCH_UMSK, 0);
+   WREG32(RADEON_CP_RB_CNTL, 

[PATCH] drm/radeon: disable possible GPU writeback early v2

2011-12-05 Thread Jerome Glisse
Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup.

v2 follow previous sanity work done on earlier radeon, also write
reg uncondionaly and disable irq too.

Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c   |2 ++
 drivers/gpu/drm/radeon/ni.c  |   18 ++
 drivers/gpu/drm/radeon/nid.h |   19 +++
 drivers/gpu/drm/radeon/r100.c|   20 ++--
 drivers/gpu/drm/radeon/r520.c|2 +-
 drivers/gpu/drm/radeon/r600.c|   16 
 drivers/gpu/drm/radeon/radeon_asic.h |2 ++
 drivers/gpu/drm/radeon/rs600.c   |   20 +++-
 drivers/gpu/drm/radeon/rs600d.h  |   21 +
 drivers/gpu/drm/radeon/rs690.c   |2 +-
 drivers/gpu/drm/radeon/rv515.c   |2 +-
 drivers/gpu/drm/radeon/rv770.c   |   16 
 drivers/gpu/drm/radeon/rv770d.h  |   20 
 13 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..6109579 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,8 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;

+   /* restore some register to sane defaults */
+   rv770_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..f5d7054 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1566,6 +1566,22 @@ int cayman_suspend(struct radeon_device *rdev)
return 0;
 }

+/*
+ * Due to how kexec works, it can leave the hw fully initialised when it
+ * boots the new kernel.
+ */
+static void cayman_restore_sanity(struct radeon_device *rdev)
+{
+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+}
+
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -1577,6 +1593,8 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = >ring[RADEON_RING_TYPE_GFX_INDEX];
int r;

+   /* restore some register to sane defaults */
+   cayman_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1 << 0)

+#define IH_RB_CNTL0x3e00
+#   define IH_RB_ENABLE   (1 << 0)
+#   define IH_IB_SIZE(x)  ((x) << 1) /* log2 */
+#   define IH_RB_FULL_DRAIN_ENABLE(1 << 6)
+#   define IH_WPTR_WRITEBACK_ENABLE   (1 << 8)
+#   define IH_WPTR_WRITEBACK_TIMER(x) ((x) << 9) /* log2 */
+#   define IH_WPTR_OVERFLOW_ENABLE(1 << 16)
+#   define IH_WPTR_OVERFLOW_CLEAR (1 << 31)
+#define IH_CNTL   0x3e18
+#   define ENABLE_INTR(1 << 0)
+#   define IH_MC_SWAP(x)  ((x) << 1)
+#   define IH_MC_SWAP_NONE0
+#   define IH_MC_SWAP_16BIT   1
+#   define IH_MC_SWAP_32BIT   2
+#   define IH_MC_SWAP_64BIT   3
+#   define RPTR_REARM (1 << 4)
+#   define MC_WRREQ_CREDIT(x) ((x) << 15)
+#   define MC_WR_CLEAN_CNT(x) ((x) << 20)
+
 #defineCC_SYS_RB_BACKEND_DISABLE   0x3F88
 #defineGC_USER_SYS_RB_BACKEND_DISABLE  0x3F8C
 #defineCGTS_SYS_TCC_DISABLE0x3F90
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 657040b..d58531f 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -3990,20 +3990,12 @@ void r100_fini(struct radeon_device *rdev)
  */
 void r100_restore_sanity(struct radeon_device *rdev)
 {
-   u32 tmp;
-
-   tmp = RREG32(RADEON_CP_CSQ_CNTL);
-   if (tmp) {
-   

[PATCH] drm/radeon: disable possible GPU writeback early v3

2011-12-05 Thread Jerome Glisse
Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup. This patch is done to fix the issue described in
the lkml thread :

WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

https://lkml.org/lkml/2011/12/5/466

Thanks to Markus Trippelsdorf for testing this.

v2 follow previous sanity work done on earlier radeon, also write
reg uncondionaly and disable irq too.
v3 update change log

Signed-off-by: Jerome Glisse 
Tested-by: Markus Trippelsdorf 
---
 drivers/gpu/drm/radeon/evergreen.c   |2 ++
 drivers/gpu/drm/radeon/ni.c  |   18 ++
 drivers/gpu/drm/radeon/nid.h |   19 +++
 drivers/gpu/drm/radeon/r100.c|   20 ++--
 drivers/gpu/drm/radeon/r520.c|2 +-
 drivers/gpu/drm/radeon/r600.c|   16 
 drivers/gpu/drm/radeon/radeon_asic.h |2 ++
 drivers/gpu/drm/radeon/rs600.c   |   20 +++-
 drivers/gpu/drm/radeon/rs600d.h  |   21 +
 drivers/gpu/drm/radeon/rs690.c   |2 +-
 drivers/gpu/drm/radeon/rv515.c   |2 +-
 drivers/gpu/drm/radeon/rv770.c   |   16 
 drivers/gpu/drm/radeon/rv770d.h  |   20 
 13 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..6109579 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,8 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;

+   /* restore some register to sane defaults */
+   rv770_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..f5d7054 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1566,6 +1566,22 @@ int cayman_suspend(struct radeon_device *rdev)
return 0;
 }

+/*
+ * Due to how kexec works, it can leave the hw fully initialised when it
+ * boots the new kernel.
+ */
+static void cayman_restore_sanity(struct radeon_device *rdev)
+{
+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+}
+
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -1577,6 +1593,8 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = >ring[RADEON_RING_TYPE_GFX_INDEX];
int r;

+   /* restore some register to sane defaults */
+   cayman_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1 << 0)

+#define IH_RB_CNTL0x3e00
+#   define IH_RB_ENABLE   (1 << 0)
+#   define IH_IB_SIZE(x)  ((x) << 1) /* log2 */
+#   define IH_RB_FULL_DRAIN_ENABLE(1 << 6)
+#   define IH_WPTR_WRITEBACK_ENABLE   (1 << 8)
+#   define IH_WPTR_WRITEBACK_TIMER(x) ((x) << 9) /* log2 */
+#   define IH_WPTR_OVERFLOW_ENABLE(1 << 16)
+#   define IH_WPTR_OVERFLOW_CLEAR (1 << 31)
+#define IH_CNTL   0x3e18
+#   define ENABLE_INTR(1 << 0)
+#   define IH_MC_SWAP(x)  ((x) << 1)
+#   define IH_MC_SWAP_NONE0
+#   define IH_MC_SWAP_16BIT   1
+#   define IH_MC_SWAP_32BIT   2
+#   define IH_MC_SWAP_64BIT   3
+#   define RPTR_REARM (1 << 4)
+#   define MC_WRREQ_CREDIT(x) ((x) << 15)
+#   define MC_WR_CLEAN_CNT(x) ((x) << 20)
+
 #defineCC_SYS_RB_BACKEND_DISABLE   0x3F88
 #defineGC_USER_SYS_RB_BACKEND_DISABLE  0x3F8C
 #defineCGTS_SYS_TCC_DISABLE0x3F90
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 657040b..d58531f 100644
--- 

[ANNOUNCE] libdrm 2.4.28

2011-12-05 Thread Chris Wilson
xf86-video-intel depends upon a couple of bug fixes in libdrm, hence the
version bump.
-Chris

Chris Wilson (2):
  intel: Unmap buffers during drm_intel_gem_bo_unmap
  configure: Bump version to 2.4.28

Daniel Vetter (1):
  intel: limit aperture space to mappable area on gen3

Jeremy Huddleston (1):
  Fix compilation with -Werror=int-to-pointer-cast 
-Werror=pointer-to-int-cast

Jerome Glisse (1):
  tests/radeon: radeon specific test

Maarten Lankhorst (1):
  nouveau: Mark nouveau subchannel unbound nouveau_grobj_free

git tag: 2.4.28

http://dri.freedesktop.org/libdrm/libdrm-2.4.28.tar.bz2
MD5:  6488f64119c6439fa4038e9cd7b30b67  libdrm-2.4.28.tar.bz2
SHA1: cb9f4e94d4ff4ec9ebf1066ef5f34c5ca63b8c38  libdrm-2.4.28.tar.bz2
SHA256: 315dc3a087b85b12559394742bc0b52f1877ee04a7b9b940241fd68da4d7244a  
libdrm-2.4.28.tar.bz2

http://dri.freedesktop.org/libdrm/libdrm-2.4.28.tar.gz
MD5:  a057daa3172033c4b573a63dc72c5813  libdrm-2.4.28.tar.gz
SHA1: acdeff3d42b56204d6a15b5e1cbf65bf3f42ae0e  libdrm-2.4.28.tar.gz
SHA256: 74b14d00d6e99b1867f7a5a74b1109d2b30a63ea4f52a961e54f7b01a48d768e  
libdrm-2.4.28.tar.gz

-- 
Chris Wilson, Intel Open Source Technology Centre
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20111205/502c2354/attachment.pgp>


[PATCH 00/23] kill drm cruft with fire

2011-12-05 Thread Daniel Vetter
On Wed, Nov 16, 2011 at 05:17:12PM +0100, Daniel Vetter wrote:
> On Mon, Nov 14, 2011 at 17:10, James Simmons  
> wrote:
> >> > Should I test this set of patches for the VIA driver or wait until you
> >> > have a second version of this patch?
> >>
> >> Testing this on via would be awesome! Iirc I haven't changed anything in
> >> the via specific patches, but if it's more convenient you can also
> >> directly test my branch:
> >>
> >> http://cgit.freedesktop.org/~danvet/drm/log/?h=kill-with-fire
> >
> > Okay I tried the patches and it locked up the openchrome X server. I'm
> > going to try your branch tonight to see if it makes any difference. If it
> > still fails I will have to track down what the problem is.
>
> If you can bisect the issue, that would be awesome. Meanwhile my sis
> card arrived, so I'm hopefully get around to test that part of the
> series rsn. I'm traveling atm though, so response time will suffer a
> bit.

Any updates on testing results?

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[Dual-LVDS Acer Iconia laptop] i915/DRM issue: one screen stays off [3.2-rc4+]

2011-12-05 Thread joeyli
Add Cc. to platform-driver-x86 and linux-acpi

Hi Baptiste

? ??2011-12-04 ? 17:07 +0100?Baptiste Jonglez ???
> Hi,
> 
> I've got a lot of troubles with a dual-LVDS Acer laptop (it doesn't
> have a keyboard, but two displays with touchscreens)
> 
> The Intel GPU is integrated into the Core i5-480M CPU: it's a bit
> older than Sandybridge, as it seems to be based on the Arrandale
> micro-architecture.
> 
> In the BIOS, both displays work fine; but as soon as the kernel boots
> up, the second display (i.e. the one where you usually find a
> keyboard) is turned off. The main display works as expected.
> 
> xrandr reports two LVDS displays: LVDS1, which is connected, and
> LVDS2, which is marked as "disconnected". No matter what I tried, I
> can't bring that second display up.
> 
> During the boot, just after the drm is set up, the following message
> shows up:
> 
>   [drm:intel_dsm_pci_probe] *ERROR* failed to get supported _DSM functions
> 
> (attached is the relevant part of dmesg [1])
> 
> 

Have no idea for this _DSM error, need help from drm and acpi experts.

> 
> I then tried booting with "video=LVDS-2:e". The same message shows up
> while booting, with these two following:
> 
>   [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:4]
>   fbcon_init: detected unhandled fb_set_par error, error code -22
> 
> (attached is the relevant part of dmesg [2])
> 
> With that kernel command line forcing LVDS2, the
> "drm_crtc_helper_set_config" error shows up each time I switch tty;
> additionally, X does not want to start anymore (spewing out the
> aforementioned error multiple times before giving up)
> 
> 
> I'm currently using the latest 3.2 kernel from linus' tree
> (af968e29acd91ebeb4224e899202c46c93171ecd), but the behavior was
> similar with a vanilla 3.1.2.
> 
> 
> Other notes about this issue:
> 
>  - with an Ubuntu 2.6.35 kernel, the second display is on but
>flickering (with the picture distorted like an old analog TV...).
>The main display is working fine, as always.
> 
>  - with an Archlinux 2.6.37.5 kernel, the behavior is the same as with
>3.2, the main display is ok and the second one is off.
> 
>  - I did succeed, only once and out of pure luck, to get the second
>screen to work with the 3.1.2 kernel. I haven't been able to
>reproduce that... I had booted with "video=LVDS-2:e" and let the
>laptop running ; pressing a key a few hours later turned back
>*both* displays on (the main display had been turned off by DPMS,
>and the second, well, was off from the start, as always)
>While not very helpful, it shows that it's definitely possible.
> 

What does Windows platform's behavior? Does there have any physical key
that can turn on/off the second LVDS on Windows?

>  - there are a some unhandled WMI events logged from the acer-wmi
>module [3] when closing the lid, opening it, and most importantly,
>when the (main) screen is turned on or off by DPMS.
> 

I will look at your dsdt and log from acer-wmi then try to improve
acer-wmi.

> 
> 
> What do you think? I haven't really succeeded in nailing the source of
> the issue down, but here are a few possibilities I'm thinking of:
> 
>  - the driver is not aware it can drive two LVDS displays (not very
>likely, and it has worked once, see above)
> 
>  - there is some kind of switch that is able to turn the second screen
>on or off (I'm thinking of something like rfkill). If so, it looks
>like something non-standard and undocumented. This would explain
>the WMI events (see the last note above)
> 

What's the behavior of Windows?

>  - buggy ACPI implementation. I tried to extract then recompile the
>DSDT [4], and iasl spews out 17 errors and 12 warnings. Also worth
>noticing is that line in dmesg:
> "pci:00: ACPI _OSC request failed (AE_ERROR), returned control mask: 0x1d"
> 
> 
> The Archlinux userland is:
>  - libdrm 2.4.27
>  - xorg-server 1.11.2
>  - intel-dri 7.11.1
>  - xf86-video-intel 2.17.0
> 
> 
> Please let me know if there are any other details I should provide.
> Regards,
> Baptiste
> 
> Attachments:
> [1] dmesg-DSM-functions.log - drm errors when booting normally
> [2] dmesg-video-lvds2.log - drm errors when forcing LVDS2 on the cmdline
> [3] acer_wmi.log - WMI events that land in dmesg
> [4] dsdt - /sys/firmware/acpi/tables/DSDT

Please also attached on dmidecode log.


Thank's a lot!
Joey Lee



WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread David Laight

> > If I had to guess it looks like 0 is getting written back to some
> > random page by the GPU maybe, it could be that the GPU is in some
half
> > setup state at boot or on a reboot does it happen from a cold boot
or
> > just warm boot or kexec?
> 
> Only happened with kexec thus far. Cold boot seems to be fine.

Sounds like the GPU is writing to physical memory from the
old mappings.
This can happen to other devices if they aren't completely
disabled - which may not happen since the kexec case probably
avoids some of the hardware resets that occurr diring a normal
reboot.

I remember an ethernet chip writing into its rx ring/buffer
area following a reboot (and reinstall!) when connected
to a quiet lan.

David




i915: eDP regression

2011-12-05 Thread Adam Jackson
On Sat, 2011-12-03 at 19:35 +0200, Kirill A. Shutemov wrote:
> Hi,
> 
> Commit dc22ee6 introduces regression on my laptop HP EliteBook 8440p.  I see
> nothing on the panel after mode setting. Reverting the commit fixes the issue.

Try this patch (might need rediffing):

http://www.mail-archive.com/intel-gfx at lists.freedesktop.org/msg05889.html

- ajax
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20111205/7e441c0f/attachment.pgp>


[PATCH] vmwgfx: Use kcalloc instead of kzalloc to allocate array

2011-12-05 Thread Jakob Bornecrantz
Reviewed-by: Jakob Bornecrantz 

- Original Message -
> The advantage of kcalloc is, that will prevent integer overflows
> which could result from the multiplication of number of elements
> and size and it is also a bit nicer to read.
> 
> The semantic patch that makes this change is available
> in https://lkml.org/lkml/2011/11/25/107
> 
> Signed-off-by: Thomas Meyer 
> ---
> 
> diff -u -p a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c 2011-11-13
> 11:07:24.343455126 +0100
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c 2011-11-28
> 19:50:07.413502824 +0100
> @@ -140,7 +140,7 @@ int vmw_present_ioctl(struct drm_device
>   goto out_clips;
>   }
>  
> - clips = kzalloc(num_clips * sizeof(*clips), GFP_KERNEL);
> + clips = kcalloc(num_clips, sizeof(*clips), GFP_KERNEL);
>   if (clips == NULL) {
>   DRM_ERROR("Failed to allocate clip rect list.\n");
>   ret = -ENOMEM;
> @@ -232,7 +232,7 @@ int vmw_present_readback_ioctl(struct dr
>   goto out_clips;
>   }
>  
> - clips = kzalloc(num_clips * sizeof(*clips), GFP_KERNEL);
> + clips = kcalloc(num_clips, sizeof(*clips), GFP_KERNEL);
>   if (clips == NULL) {
>   DRM_ERROR("Failed to allocate clip rect list.\n");
>   ret = -ENOMEM;


Re: [PATCH] vmwgfx: Use kcalloc instead of kzalloc to allocate array

2011-12-05 Thread Jakob Bornecrantz
Reviewed-by: Jakob Bornecrantz ja...@vmware.com

- Original Message -
 The advantage of kcalloc is, that will prevent integer overflows
 which could result from the multiplication of number of elements
 and size and it is also a bit nicer to read.
 
 The semantic patch that makes this change is available
 in https://lkml.org/lkml/2011/11/25/107
 
 Signed-off-by: Thomas Meyer tho...@m3y3r.de
 ---
 
 diff -u -p a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
 b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c 2011-11-13
 11:07:24.343455126 +0100
 +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c 2011-11-28
 19:50:07.413502824 +0100
 @@ -140,7 +140,7 @@ int vmw_present_ioctl(struct drm_device
   goto out_clips;
   }
  
 - clips = kzalloc(num_clips * sizeof(*clips), GFP_KERNEL);
 + clips = kcalloc(num_clips, sizeof(*clips), GFP_KERNEL);
   if (clips == NULL) {
   DRM_ERROR(Failed to allocate clip rect list.\n);
   ret = -ENOMEM;
 @@ -232,7 +232,7 @@ int vmw_present_readback_ioctl(struct dr
   goto out_clips;
   }
  
 - clips = kzalloc(num_clips * sizeof(*clips), GFP_KERNEL);
 + clips = kcalloc(num_clips, sizeof(*clips), GFP_KERNEL);
   if (clips == NULL) {
   DRM_ERROR(Failed to allocate clip rect list.\n);
   ret = -ENOMEM;
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 00/23] kill drm cruft with fire

2011-12-05 Thread Daniel Vetter
On Wed, Nov 16, 2011 at 05:17:12PM +0100, Daniel Vetter wrote:
 On Mon, Nov 14, 2011 at 17:10, James Simmons jsimm...@infradead.org wrote:
   Should I test this set of patches for the VIA driver or wait until you
   have a second version of this patch?
 
  Testing this on via would be awesome! Iirc I haven't changed anything in
  the via specific patches, but if it's more convenient you can also
  directly test my branch:
 
  http://cgit.freedesktop.org/~danvet/drm/log/?h=kill-with-fire
 
  Okay I tried the patches and it locked up the openchrome X server. I'm
  going to try your branch tonight to see if it makes any difference. If it
  still fails I will have to track down what the problem is.

 If you can bisect the issue, that would be awesome. Meanwhile my sis
 card arrived, so I'm hopefully get around to test that part of the
 series rsn. I'm traveling atm though, so response time will suffer a
 bit.

Any updates on testing results?

Yours, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[ANNOUNCE] libdrm 2.4.28

2011-12-05 Thread Chris Wilson
xf86-video-intel depends upon a couple of bug fixes in libdrm, hence the
version bump.
-Chris

Chris Wilson (2):
  intel: Unmap buffers during drm_intel_gem_bo_unmap
  configure: Bump version to 2.4.28

Daniel Vetter (1):
  intel: limit aperture space to mappable area on gen3

Jeremy Huddleston (1):
  Fix compilation with -Werror=int-to-pointer-cast 
-Werror=pointer-to-int-cast

Jerome Glisse (1):
  tests/radeon: radeon specific test

Maarten Lankhorst (1):
  nouveau: Mark nouveau subchannel unbound nouveau_grobj_free

git tag: 2.4.28

http://dri.freedesktop.org/libdrm/libdrm-2.4.28.tar.bz2
MD5:  6488f64119c6439fa4038e9cd7b30b67  libdrm-2.4.28.tar.bz2
SHA1: cb9f4e94d4ff4ec9ebf1066ef5f34c5ca63b8c38  libdrm-2.4.28.tar.bz2
SHA256: 315dc3a087b85b12559394742bc0b52f1877ee04a7b9b940241fd68da4d7244a  
libdrm-2.4.28.tar.bz2

http://dri.freedesktop.org/libdrm/libdrm-2.4.28.tar.gz
MD5:  a057daa3172033c4b573a63dc72c5813  libdrm-2.4.28.tar.gz
SHA1: acdeff3d42b56204d6a15b5e1cbf65bf3f42ae0e  libdrm-2.4.28.tar.gz
SHA256: 74b14d00d6e99b1867f7a5a74b1109d2b30a63ea4f52a961e54f7b01a48d768e  
libdrm-2.4.28.tar.gz

-- 
Chris Wilson, Intel Open Source Technology Centre


signature.asc
Description: Digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread David Laight
 
  If I had to guess it looks like 0 is getting written back to some
  random page by the GPU maybe, it could be that the GPU is in some
half
  setup state at boot or on a reboot does it happen from a cold boot
or
  just warm boot or kexec?
 
 Only happened with kexec thus far. Cold boot seems to be fine.

Sounds like the GPU is writing to physical memory from the
old mappings.
This can happen to other devices if they aren't completely
disabled - which may not happen since the kexec case probably
avoids some of the hardware resets that occurr diring a normal
reboot.

I remember an ethernet chip writing into its rx ring/buffer
area following a reboot (and reinstall!) when connected
to a quiet lan.

David


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: i915: eDP regression

2011-12-05 Thread Adam Jackson
On Sat, 2011-12-03 at 19:35 +0200, Kirill A. Shutemov wrote:
 Hi,
 
 Commit dc22ee6 introduces regression on my laptop HP EliteBook 8440p.  I see
 nothing on the panel after mode setting. Reverting the commit fixes the issue.

Try this patch (might need rediffing):

http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg05889.html

- ajax


signature.asc
Description: This is a digitally signed message part
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Friday 02 December 2011, Sumit Semwal wrote:
 This is the first step in defining a dma buffer sharing mechanism.

This looks very nice, but there are a few things I don't understand yet
and a bunch of trivial comments I have about things I spotted.

Do you have prototype exporter and consumer drivers that you can post
for clarification?

In the patch 2, you have a section about migration that mentions that
it is possible to export a buffer that can be migrated after it
is already mapped into one user driver. How does that work when
the physical addresses are mapped into a consumer device already?

 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 21cf46f..07d8095 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
  
  source drivers/base/regmap/Kconfig
  
 +config DMA_SHARED_BUFFER
 + bool Buffer framework to be shared between drivers
 + default n
 + depends on ANON_INODES

I would make this 'select ANON_INODES', like the other users of this
feature.

 + return dmabuf;
 +}
 +EXPORT_SYMBOL(dma_buf_export);

I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
because it's really a low-level function that I would expect
to get used by in-kernel subsystems providing the feature to
users and having back-end drivers, but it's not the kind of thing
we want out-of-tree drivers to mess with.

 +/**
 + * dma_buf_fd - returns a file descriptor for the given dma_buf
 + * @dmabuf:  [in]pointer to dma_buf for which fd is required.
 + *
 + * On success, returns an associated 'fd'. Else, returns error.
 + */
 +int dma_buf_fd(struct dma_buf *dmabuf)
 +{
 + int error, fd;
 +
 + if (!dmabuf-file)
 + return -EINVAL;
 +
 + error = get_unused_fd_flags(0);

Why not simply get_unused_fd()?

 +/**
 + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
 + * calls attach() of dma_buf_ops to allow device-specific attach 
 functionality
 + * @dmabuf:  [in]buffer to attach device to.
 + * @dev: [in]device to be attached.
 + *
 + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
 + *

Or may return a negative error code. It's better to be consistent here:
either always return NULL on error, or change the allocation error to
ERR_PTR(-ENOMEM).

 + */
 +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
 + struct device *dev)
 +{
 + struct dma_buf_attachment *attach;
 + int ret;
 +
 + BUG_ON(!dmabuf || !dev);
 +
 + attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
 + if (attach == NULL)
 + goto err_alloc;
 +
 + mutex_lock(dmabuf-lock);
 +
 + attach-dev = dev;
 + attach-dmabuf = dmabuf;
 + if (dmabuf-ops-attach) {
 + ret = dmabuf-ops-attach(dmabuf, dev, attach);
 + if (!ret)
 + goto err_attach;

You probably mean if (ret) here instead of if (!ret), right?

 + /* allow allocator to take care of cache ops */
 + void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
 + void (*sync_sg_for_device)(struct dma_buf *, struct device *);

I don't see how this works with multiple consumers: For the streaming
DMA mapping, there must be exactly one owner, either the device or
the CPU. Obviously, this rule needs to be extended when you get to
multiple devices and multiple device drivers, plus possibly user
mappings. Simply assigning the buffer to the device from one
driver does not block other drivers from touching the buffer, and
assigning it to the cpu does not stop other hardware that the
code calling sync_sg_for_cpu is not aware of.

The only way to solve this that I can think of right now is to
mandate that the mappings are all coherent (i.e. noncachable
on noncoherent architectures like ARM). If you do that, you no
longer need the sync_sg_for_* calls.

 +#ifdef CONFIG_DMA_SHARED_BUFFER

Do you have a use case for making the interface compile-time disabled?
I had assumed that any code using it would make no sense if it's not
available so you don't actually need this.

Arnd
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Markus Trippelsdorf
On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
 On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
  On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
   On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2011.12.03 at 12:20 +, Dave Airlie wrote:
 FIX idr_layer_cache: Marking all objects used
   
Yesterday I couldn't reproduce the issue at all. But today 
I've hit
exactly the same spot again. (CCing the drm list)
   
If I had to guess it looks like 0 is getting written back to some
random page by the GPU maybe, it could be that the GPU is in some half
setup state at boot or on a reboot does it happen from a cold boot or
just warm boot or kexec?
   
Only happened with kexec thus far. Cold boot seems to be fine.
   
   
   Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
   you can reproduce.
  
  No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
  after 700 successful kexec iterations...)
  
 
 Can you try if attached patch fix the issue when you don't pass the
 radeon.no_wb=1 option ?

Yes the patch finally fixes the issue for me (tested with 120 kexec
iterations).
Thanks Jerome!

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 5, 2011 at 1:15 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
 On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
  On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
   On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2011.12.03 at 12:20 +, Dave Airlie wrote:
 FIX idr_layer_cache: Marking all objects used
   
Yesterday I couldn't reproduce the issue at all. But today 
I've hit
exactly the same spot again. (CCing the drm list)
   
If I had to guess it looks like 0 is getting written back to some
random page by the GPU maybe, it could be that the GPU is in some half
setup state at boot or on a reboot does it happen from a cold boot or
just warm boot or kexec?
   
Only happened with kexec thus far. Cold boot seems to be fine.
   
  
   Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
   you can reproduce.
 
  No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
  after 700 successful kexec iterations...)
 

 Can you try if attached patch fix the issue when you don't pass the
 radeon.no_wb=1 option ?

 Yes the patch finally fixes the issue for me (tested with 120 kexec
 iterations).
 Thanks Jerome!

 --
 Markus

Will respin with some minor code changes.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 05:18:48PM +, Arnd Bergmann wrote:
 On Friday 02 December 2011, Sumit Semwal wrote:
  +   /* allow allocator to take care of cache ops */
  +   void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
  +   void (*sync_sg_for_device)(struct dma_buf *, struct device *);

 I don't see how this works with multiple consumers: For the streaming
 DMA mapping, there must be exactly one owner, either the device or
 the CPU. Obviously, this rule needs to be extended when you get to
 multiple devices and multiple device drivers, plus possibly user
 mappings. Simply assigning the buffer to the device from one
 driver does not block other drivers from touching the buffer, and
 assigning it to the cpu does not stop other hardware that the
 code calling sync_sg_for_cpu is not aware of.

 The only way to solve this that I can think of right now is to
 mandate that the mappings are all coherent (i.e. noncachable
 on noncoherent architectures like ARM). If you do that, you no
 longer need the sync_sg_for_* calls.

Woops, totally missed the addition of these. Can somebody explain to used
to rather coherent x86 what we need these for and the code-flow would look
like in a typical example. I was kinda assuming that devices would bracket
their use of a buffer with the attachment_map/unmap calls and any cache
coherency magic that might be needed would be somewhat transparent to
users of the interface?

The map call gets the dma_data_direction parameter, so it should be able
to do the right thing. And because we keep the attachement around, any
caching of mappings should be possible, too.

Yours, Daniel

PS: Slightly related, because it will make the coherency nightmare worse,
afaict: Can we kill mmap support?
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 05, 2011 at 07:15:49PM +0100, Markus Trippelsdorf wrote:
 On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
  On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
   On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2011.12.03 at 12:20 +, Dave Airlie wrote:
  FIX idr_layer_cache: Marking all objects used

 Yesterday I couldn't reproduce the issue at all. But today 
 I've hit
 exactly the same spot again. (CCing the drm list)

 If I had to guess it looks like 0 is getting written back to some
 random page by the GPU maybe, it could be that the GPU is in some 
 half
 setup state at boot or on a reboot does it happen from a cold boot or
 just warm boot or kexec?

 Only happened with kexec thus far. Cold boot seems to be fine.


Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
you can reproduce.
   
   No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
   after 700 successful kexec iterations...)
   
  
  Can you try if attached patch fix the issue when you don't pass the
  radeon.no_wb=1 option ?
 
 Yes the patch finally fixes the issue for me (tested with 120 kexec
 iterations).
 Thanks Jerome!
 
 -- 
 Markus

Can you do a kick run on the modified patch ?

I believe this patch could go to stable too as it's low
impact from my pov.

Cheers,
Jerome
From cccfa6f93faa6b556fd72e318606a01e333e67d3 Mon Sep 17 00:00:00 2001
From: Jerome Glisse jgli...@redhat.com
Date: Mon, 5 Dec 2011 12:02:17 -0500
Subject: [PATCH] drm/radeon: disable possible GPU writeback early v2

Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup.

v2 follow previous sanity work done on earlier radeon, also write
reg uncondionaly and disable irq too.

Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/evergreen.c   |2 ++
 drivers/gpu/drm/radeon/ni.c  |   18 ++
 drivers/gpu/drm/radeon/nid.h |   19 +++
 drivers/gpu/drm/radeon/r100.c|   20 ++--
 drivers/gpu/drm/radeon/r520.c|2 +-
 drivers/gpu/drm/radeon/r600.c|   16 
 drivers/gpu/drm/radeon/radeon_asic.h |2 ++
 drivers/gpu/drm/radeon/rs600.c   |   20 +++-
 drivers/gpu/drm/radeon/rs600d.h  |   21 +
 drivers/gpu/drm/radeon/rs690.c   |2 +-
 drivers/gpu/drm/radeon/rv515.c   |2 +-
 drivers/gpu/drm/radeon/rv770.c   |   16 
 drivers/gpu/drm/radeon/rv770d.h  |   20 
 13 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..6109579 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,8 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;
 
+   /* restore some register to sane defaults */
+   rv770_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..f5d7054 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1566,6 +1566,22 @@ int cayman_suspend(struct radeon_device *rdev)
return 0;
 }
 
+/*
+ * Due to how kexec works, it can leave the hw fully initialised when it
+ * boots the new kernel.
+ */
+static void cayman_restore_sanity(struct radeon_device *rdev)
+{
+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+}
+
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -1577,6 +1593,8 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = rdev-ring[RADEON_RING_TYPE_GFX_INDEX];
int r;
 
+   /* restore some register to sane defaults */
+   cayman_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1  0)
 
+#define IH_RB_CNTL0x3e00
+#   

Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Markus Trippelsdorf
On 2011.12.05 at 14:11 -0500, Jerome Glisse wrote:
 On Mon, Dec 05, 2011 at 07:15:49PM +0100, Markus Trippelsdorf wrote:
  On 2011.12.05 at 12:10 -0500, Jerome Glisse wrote:
   On Sun, Dec 04, 2011 at 02:02:00AM +0100, Markus Trippelsdorf wrote:
On 2011.12.03 at 14:31 -0500, Jerome Glisse wrote:
 On Sat, Dec 3, 2011 at 7:29 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2011.12.03 at 12:20 +, Dave Airlie wrote:
   FIX idr_layer_cache: Marking all objects used
 
  Yesterday I couldn't reproduce the issue at all. But 
  today I've hit
  exactly the same spot again. (CCing the drm list)
 
  If I had to guess it looks like 0 is getting written back to some
  random page by the GPU maybe, it could be that the GPU is in some 
  half
  setup state at boot or on a reboot does it happen from a cold boot 
  or
  just warm boot or kexec?
 
  Only happened with kexec thus far. Cold boot seems to be fine.
 
 
 Can you add radeon.no_wb=1 to your kexec kernel paramater an see if
 you can reproduce.

No, I cannot reproduce the issue with radeon.no_wb=1. (I write this
after 700 successful kexec iterations...)

   
   Can you try if attached patch fix the issue when you don't pass the
   radeon.no_wb=1 option ?
  
  Yes the patch finally fixes the issue for me (tested with 120 kexec
  iterations).
  Thanks Jerome!
  
  -- 
  Markus
 
 Can you do a kick run on the modified patch ?

This one is also OK after ~60 iterations.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Monday 05 December 2011 19:55:44 Daniel Vetter wrote:
  The only way to solve this that I can think of right now is to
  mandate that the mappings are all coherent (i.e. noncachable
  on noncoherent architectures like ARM). If you do that, you no
  longer need the sync_sg_for_* calls.
 
 Woops, totally missed the addition of these. Can somebody explain to used
 to rather coherent x86 what we need these for and the code-flow would look
 like in a typical example. I was kinda assuming that devices would bracket
 their use of a buffer with the attachment_map/unmap calls and any cache
 coherency magic that might be needed would be somewhat transparent to
 users of the interface?

I'll describe how the respective functions work in the streaming mapping
API (dma_map_*): You start out with a buffer that is owned by the CPU,
i.e. the kernel can access it freely. When you call dma_map_sg or similar,
a noncoherent device reading the buffer requires the cache to be flushed
in order to see the data that was written by the CPU into the cache.

After dma_map_sg, the device can perform both read and write accesses
(depending on the flag to the map call), but the CPU is no longer allowed
to read (which would allocate a cache line that may become invalid but
remain marked as clean) or write (which would create a dirty cache line
without writing it back) that buffer.

Once the device is done, the driver calls dma_unmap_* and the buffer is
again owned by the CPU. The device can no longer access it (in fact
the address may be no longer be backed if there is an iommu) and the CPU
can again read and write the buffer. On ARMv6 and higher, possibly some
other architectures, dma_unmap_* also needs to invalidate the cache
for the buffer, because due to speculative prefetching, there may also
be a new clean cache line with stale data from an earlier version of
the buffer.

Since map/unmap is an expensive operation, the interface was extended
to pass back the ownership to the CPU and back to the device while leaving
the buffer mapped. dma_sync_sg_for_cpu invalidates the cache in the same
way as dma_unmap_sg, so the CPU can access the buffer, and
dma_sync_sg_for_device hands it back to the device by performing the
same cache flush that dma_map_sg would do.

You could for example do this if you want video input with a
cacheable buffer, or in an rdma scenario with a buffer accessed
by a remote machine.

In case of software iommu (swiotlb, dmabounce), the map and sync
functions don't do cache management but instead copy data between
a buffer accessed by hardware and a different buffer accessed
by the user.

 The map call gets the dma_data_direction parameter, so it should be able
 to do the right thing. And because we keep the attachement around, any
 caching of mappings should be possible, too.
 
 Yours, Daniel
 
 PS: Slightly related, because it will make the coherency nightmare worse,
 afaict: Can we kill mmap support?

The mmap support is required (and only make sense) for consistent mappings,
not for streaming mappings. The provider must ensure that if you map
something uncacheable into the kernel in order to provide consistency,
any mapping into user space must also be uncacheable. A driver
that wants to have the buffer mapped to user space as many do should
not need to know whether it is required to do cacheable or uncacheable
mapping because of some other driver, and it should not need to worry
about how to set up uncached mappings in user space.

Arnd
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Pekka Enberg
On Mon, Dec 5, 2011 at 9:27 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
  Yes the patch finally fixes the issue for me (tested with 120 kexec
  iterations).
  Thanks Jerome!

 Can you do a kick run on the modified patch ?

 This one is also OK after ~60 iterations.

Jerome, could you please include a reference to this LKML thread for
context and attribution for Markus for reporting and following up to
get the issue fixed in the changelog?

  Pekka
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 43538] New: libdrm-2.4.28: rbo.h is missing.

2011-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=43538

 Bug #: 43538
   Summary: libdrm-2.4.28: rbo.h is missing.
Classification: Unclassified
   Product: DRI
   Version: XOrg CVS
  Platform: Other
OS/Version: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: libdrm
AssignedTo: dri-devel@lists.freedesktop.org
ReportedBy: michel.herm...@gmail.com


Hi,
It seems the tarball generation is broken and the rbo.h is missing from
tests/radeon/ resulting with an obvious gcc error:
  CC rbo.o
radeon_ttm.c:28:17: fatal error: rbo.h: No such file or directory

Thanks for your hard work.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

2011-12-05 Thread Jerome Glisse
On Mon, Dec 05, 2011 at 10:10:34PM +0200, Pekka Enberg wrote:
 On Mon, Dec 5, 2011 at 9:27 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
   Yes the patch finally fixes the issue for me (tested with 120 kexec
   iterations).
   Thanks Jerome!
 
  Can you do a kick run on the modified patch ?
 
  This one is also OK after ~60 iterations.
 
 Jerome, could you please include a reference to this LKML thread for
 context and attribution for Markus for reporting and following up to
 get the issue fixed in the changelog?
 
   Pekka

Attached updated patch, only changelog is different. Thanks Markus for
testing this.

Cheers,
Jerome
From cccfa6f93faa6b556fd72e318606a01e333e67d3 Mon Sep 17 00:00:00 2001
From: Jerome Glisse jgli...@redhat.com
Date: Mon, 5 Dec 2011 12:02:17 -0500
Subject: [PATCH] drm/radeon: disable possible GPU writeback early v3

Given how kexec works we need to disable any kind of GPU writeback
early in GPU initialization just in case some are still active from
previous setup. This patch is done to fix the issue described in
the lkml thread :

WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

https://lkml.org/lkml/2011/12/5/466

Thanks to Markus Trippelsdorf for testing this.

v2 follow previous sanity work done on earlier radeon, also write
reg uncondionaly and disable irq too.
v3 update change log

Signed-off-by: Jerome Glisse jgli...@redhat.com
Tested-by: Markus Trippelsdorf mar...@trippelsdorf.de
---
 drivers/gpu/drm/radeon/evergreen.c   |2 ++
 drivers/gpu/drm/radeon/ni.c  |   18 ++
 drivers/gpu/drm/radeon/nid.h |   19 +++
 drivers/gpu/drm/radeon/r100.c|   20 ++--
 drivers/gpu/drm/radeon/r520.c|2 +-
 drivers/gpu/drm/radeon/r600.c|   16 
 drivers/gpu/drm/radeon/radeon_asic.h |2 ++
 drivers/gpu/drm/radeon/rs600.c   |   20 +++-
 drivers/gpu/drm/radeon/rs600d.h  |   21 +
 drivers/gpu/drm/radeon/rs690.c   |2 +-
 drivers/gpu/drm/radeon/rv515.c   |2 +-
 drivers/gpu/drm/radeon/rv770.c   |   16 
 drivers/gpu/drm/radeon/rv770d.h  |   20 
 13 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index 1934728..6109579 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3249,6 +3249,8 @@ int evergreen_init(struct radeon_device *rdev)
 {
int r;
 
+   /* restore some register to sane defaults */
+   rv770_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index c15fc8b..f5d7054 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1566,6 +1566,22 @@ int cayman_suspend(struct radeon_device *rdev)
return 0;
 }
 
+/*
+ * Due to how kexec works, it can leave the hw fully initialised when it
+ * boots the new kernel.
+ */
+static void cayman_restore_sanity(struct radeon_device *rdev)
+{
+   /* stop possible GPU activities */
+   WREG32(IH_RB_CNTL, 0);
+   WREG32(IH_CNTL, 0);
+   WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT);
+   WREG32(SCRATCH_UMSK, 0);
+   WREG32(CP_RB0_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB1_CNTL, RB_NO_UPDATE);
+   WREG32(CP_RB2_CNTL, RB_NO_UPDATE);
+}
+
 /* Plan is to move initialization in that function and use
  * helper function so that radeon_device_init pretty much
  * do nothing more than calling asic specific function. This
@@ -1577,6 +1593,8 @@ int cayman_init(struct radeon_device *rdev)
struct radeon_ring *ring = rdev-ring[RADEON_RING_TYPE_GFX_INDEX];
int r;
 
+   /* restore some register to sane defaults */
+   cayman_restore_sanity(rdev);
/* This don't do much */
r = radeon_gem_init(rdev);
if (r)
diff --git a/drivers/gpu/drm/radeon/nid.h b/drivers/gpu/drm/radeon/nid.h
index 4640334..3aa33c6 100644
--- a/drivers/gpu/drm/radeon/nid.h
+++ b/drivers/gpu/drm/radeon/nid.h
@@ -162,6 +162,25 @@
 #define HDP_MISC_CNTL  0x2F4C
 #defineHDP_FLUSH_INVALIDATE_CACHE  (1  0)
 
+#define IH_RB_CNTL0x3e00
+#   define IH_RB_ENABLE   (1  0)
+#   define IH_IB_SIZE(x)  ((x)  1) /* log2 */
+#   define IH_RB_FULL_DRAIN_ENABLE(1  6)
+#   define IH_WPTR_WRITEBACK_ENABLE   (1  8)
+#   define IH_WPTR_WRITEBACK_TIMER(x) ((x)  9) /* log2 */
+#   define IH_WPTR_OVERFLOW_ENABLE(1  16)
+#   define IH_WPTR_OVERFLOW_CLEAR (1  31)
+#define IH_CNTL   0x3e18
+#   define 

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann a...@arndb.de wrote:
 On Friday 02 December 2011, Sumit Semwal wrote:
 This is the first step in defining a dma buffer sharing mechanism.

 This looks very nice, but there are a few things I don't understand yet
 and a bunch of trivial comments I have about things I spotted.

 Do you have prototype exporter and consumer drivers that you can post
 for clarification?

There is some dummy drivers based on an earlier version.  And airlied
has a prime (multi-gpu) prototype:

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

I've got a nearly working camera+display prototype:

https://github.com/robclark/kernel-omap4/commits/dmabuf

 In the patch 2, you have a section about migration that mentions that
 it is possible to export a buffer that can be migrated after it
 is already mapped into one user driver. How does that work when
 the physical addresses are mapped into a consumer device already?

I think you can do physical migration if you are attached, but
probably not if you are mapped.

 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 21cf46f..07d8095 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -174,4 +174,14 @@ config SYS_HYPERVISOR

  source drivers/base/regmap/Kconfig

 +config DMA_SHARED_BUFFER
 +     bool Buffer framework to be shared between drivers
 +     default n
 +     depends on ANON_INODES

 I would make this 'select ANON_INODES', like the other users of this
 feature.

 +     return dmabuf;
 +}
 +EXPORT_SYMBOL(dma_buf_export);

 I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
 because it's really a low-level function that I would expect
 to get used by in-kernel subsystems providing the feature to
 users and having back-end drivers, but it's not the kind of thing
 we want out-of-tree drivers to mess with.

 +/**
 + * dma_buf_fd - returns a file descriptor for the given dma_buf
 + * @dmabuf:  [in]    pointer to dma_buf for which fd is required.
 + *
 + * On success, returns an associated 'fd'. Else, returns error.
 + */
 +int dma_buf_fd(struct dma_buf *dmabuf)
 +{
 +     int error, fd;
 +
 +     if (!dmabuf-file)
 +             return -EINVAL;
 +
 +     error = get_unused_fd_flags(0);

 Why not simply get_unused_fd()?

 +/**
 + * dma_buf_attach - Add the device to dma_buf's attachments list; 
 optionally,
 + * calls attach() of dma_buf_ops to allow device-specific attach 
 functionality
 + * @dmabuf:  [in]    buffer to attach device to.
 + * @dev:     [in]    device to be attached.
 + *
 + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
 + *

 Or may return a negative error code. It's better to be consistent here:
 either always return NULL on error, or change the allocation error to
 ERR_PTR(-ENOMEM).

 + */
 +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
 +                                             struct device *dev)
 +{
 +     struct dma_buf_attachment *attach;
 +     int ret;
 +
 +     BUG_ON(!dmabuf || !dev);
 +
 +     attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
 +     if (attach == NULL)
 +             goto err_alloc;
 +
 +     mutex_lock(dmabuf-lock);
 +
 +     attach-dev = dev;
 +     attach-dmabuf = dmabuf;
 +     if (dmabuf-ops-attach) {
 +             ret = dmabuf-ops-attach(dmabuf, dev, attach);
 +             if (!ret)
 +                     goto err_attach;

 You probably mean if (ret) here instead of if (!ret), right?

 +     /* allow allocator to take care of cache ops */
 +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
 +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);

 I don't see how this works with multiple consumers: For the streaming
 DMA mapping, there must be exactly one owner, either the device or
 the CPU. Obviously, this rule needs to be extended when you get to
 multiple devices and multiple device drivers, plus possibly user
 mappings. Simply assigning the buffer to the device from one
 driver does not block other drivers from touching the buffer, and
 assigning it to the cpu does not stop other hardware that the
 code calling sync_sg_for_cpu is not aware of.

 The only way to solve this that I can think of right now is to
 mandate that the mappings are all coherent (i.e. noncachable
 on noncoherent architectures like ARM). If you do that, you no
 longer need the sync_sg_for_* calls.

My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
ioctls and corresponding dmabuf ops, which userspace is required to
call before / after CPU access.  Or just remove mmap() and do the
mmap() via allocating device and use that device's equivalent
DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
would give you a way to (a) synchronize with gpu/asynchronous
pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
that gives you a convenient place 

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
 On Monday 05 December 2011 19:55:44 Daniel Vetter wrote:
   The only way to solve this that I can think of right now is to
   mandate that the mappings are all coherent (i.e. noncachable
   on noncoherent architectures like ARM). If you do that, you no
   longer need the sync_sg_for_* calls.
 
  Woops, totally missed the addition of these. Can somebody explain to used
  to rather coherent x86 what we need these for and the code-flow would look
  like in a typical example. I was kinda assuming that devices would bracket
  their use of a buffer with the attachment_map/unmap calls and any cache
  coherency magic that might be needed would be somewhat transparent to
  users of the interface?

 I'll describe how the respective functions work in the streaming mapping
 API (dma_map_*): You start out with a buffer that is owned by the CPU,
 i.e. the kernel can access it freely. When you call dma_map_sg or similar,
 a noncoherent device reading the buffer requires the cache to be flushed
 in order to see the data that was written by the CPU into the cache.

 After dma_map_sg, the device can perform both read and write accesses
 (depending on the flag to the map call), but the CPU is no longer allowed
 to read (which would allocate a cache line that may become invalid but
 remain marked as clean) or write (which would create a dirty cache line
 without writing it back) that buffer.

 Once the device is done, the driver calls dma_unmap_* and the buffer is
 again owned by the CPU. The device can no longer access it (in fact
 the address may be no longer be backed if there is an iommu) and the CPU
 can again read and write the buffer. On ARMv6 and higher, possibly some
 other architectures, dma_unmap_* also needs to invalidate the cache
 for the buffer, because due to speculative prefetching, there may also
 be a new clean cache line with stale data from an earlier version of
 the buffer.

 Since map/unmap is an expensive operation, the interface was extended
 to pass back the ownership to the CPU and back to the device while leaving
 the buffer mapped. dma_sync_sg_for_cpu invalidates the cache in the same
 way as dma_unmap_sg, so the CPU can access the buffer, and
 dma_sync_sg_for_device hands it back to the device by performing the
 same cache flush that dma_map_sg would do.

 You could for example do this if you want video input with a
 cacheable buffer, or in an rdma scenario with a buffer accessed
 by a remote machine.

 In case of software iommu (swiotlb, dmabounce), the map and sync
 functions don't do cache management but instead copy data between
 a buffer accessed by hardware and a different buffer accessed
 by the user.

Thanks a lot for this excellent overview. I think at least for the first
version of dmabuf we should drop the sync_* interfaces and simply require
users to bracket their usage of the buffer from the attached device by
map/unmap. A dma_buf provider is always free to cache the mapping and
simply call sync_sg_for of the streaming dma api.

If it later turns out that we want to be able to cache the sg list also on
the use-site in the driver (e.g. map it into some hw sg list) we can
always add that functionality later. I just fear that sync ops among N
devices is a bit ill-defined and we already have a plethora of ill-defined
issues at hand. Also the proposed api doesn't quite fit into what's
already there, I think an s/device/dma_buf_attachment/ would be more
consistent - otherwise the dmabuf provider might need to walk the list of
attachements to get at the right one for the device.

  The map call gets the dma_data_direction parameter, so it should be able
  to do the right thing. And because we keep the attachement around, any
  caching of mappings should be possible, too.
 
  Yours, Daniel
 
  PS: Slightly related, because it will make the coherency nightmare worse,
  afaict: Can we kill mmap support?

 The mmap support is required (and only make sense) for consistent mappings,
 not for streaming mappings. The provider must ensure that if you map
 something uncacheable into the kernel in order to provide consistency,
 any mapping into user space must also be uncacheable. A driver
 that wants to have the buffer mapped to user space as many do should
 not need to know whether it is required to do cacheable or uncacheable
 mapping because of some other driver, and it should not need to worry
 about how to set up uncached mappings in user space.

Either I've always missed it or no one ever described it that consisely,
but now I see the use-case for mmap: Simpler drivers (i.e. not gpus) might
need to expose a userspace mapping and only the provider knows how to do
that in a coherent fashion. I want this in the docs (if it's not there yet
...).

But even with that use-case in mind I still have some gripes with the
current mmap interfaces as proposed:
- This use-case explains why the dmabuf provider needs to expose an mmap
  function. 

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
 On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann a...@arndb.de wrote:
  In the patch 2, you have a section about migration that mentions that
  it is possible to export a buffer that can be migrated after it
  is already mapped into one user driver. How does that work when
  the physical addresses are mapped into a consumer device already?

 I think you can do physical migration if you are attached, but
 probably not if you are mapped.

Yeah, that's very much how I see this, and also why map/unmap (at least
for simple users like v4l) should only bracket actual usage. GPU memory
managers need to be able to move around buffers while no one is using
them.

[snip]

  +     /* allow allocator to take care of cache ops */
  +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
  +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
 
  I don't see how this works with multiple consumers: For the streaming
  DMA mapping, there must be exactly one owner, either the device or
  the CPU. Obviously, this rule needs to be extended when you get to
  multiple devices and multiple device drivers, plus possibly user
  mappings. Simply assigning the buffer to the device from one
  driver does not block other drivers from touching the buffer, and
  assigning it to the cpu does not stop other hardware that the
  code calling sync_sg_for_cpu is not aware of.
 
  The only way to solve this that I can think of right now is to
  mandate that the mappings are all coherent (i.e. noncachable
  on noncoherent architectures like ARM). If you do that, you no
  longer need the sync_sg_for_* calls.

 My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
 ioctls and corresponding dmabuf ops, which userspace is required to
 call before / after CPU access.  Or just remove mmap() and do the
 mmap() via allocating device and use that device's equivalent
 DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
 would give you a way to (a) synchronize with gpu/asynchronous
 pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
 buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
 that gives you a convenient place to do cache operations on
 noncoherent architecture.

 I sort of preferred having the DMABUF shim because that lets you pass
 a buffer around userspace without the receiving code knowing about a
 device specific API.  But the problem I eventually came around to: if
 your GL stack (or some other userspace component) is batching up
 commands before submission to kernel, the buffers you need to wait for
 completion might not even be submitted yet.  So from kernel
 perspective they are ready for cpu access.  Even though in fact they
 are not in a consistent state from rendering perspective.  I don't
 really know a sane way to deal with that.  Maybe the approach instead
 should be a userspace level API (in libkms/libdrm?) to provide
 abstraction for userspace access to buffers rather than dealing with
 this at the kernel level.

Well, there's a reason GL has an explicit flush and extensions for sync
objects. It's to support such scenarios where the driver batches up gpu
commands before actually submitting them. Also, recent gpus have all (or
shortly will grow) multiple execution pipelines, so it's also important
that you sync up with the right command stream. Syncing up with all of
them is generally frowned upon for obvious reasons ;-)

So any userspace that interacts with an OpenGL driver needs to take care
of this anyway. But I think for simpler stuff (v4l) kernel only coherency
should work and userspace just needs to take care of gl interactions and
call glflush and friends at the right points. I think we can flesh this
out precisely when we spec the dmabuf EGL extension ... (or implement one
of the preexisting ones already around).

On the topic of a coherency model for dmabuf, I think we need to look at
dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
cpu_finish or whatever they might get called) as barriers:

So after a dma_buf_map, all previsously completed dma operations (i.e.
unmap already called) and any cpu writes (i.e. cpu_finish called) will be
coherent. Similar rule holds for cpu access through the userspace mmap,
only writes completed before the cpu_start will show up.

Similar, writes done by the device are only guaranteed to show up after
the _unmap. Dito for cpu writes and cpu_finish.

In short we always need two function calls to denote the start/end of the
critical section.

Any concurrent operations are allowed to yield garbage, meaning any
combination of the old or either of the newly written contents (i.e.
non-overlapping writes might not actually all end up in the buffer,
but instead some old contents). Maybe we even need to loosen that to
the real undefined behaviour, but atm I can't think of an example.

-Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch

[Bug 43538] libdrm-2.4.28: rbo.h is missing.

2011-12-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=43538

Arkadiusz Miskiewicz ar...@maven.pl changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED

--- Comment #1 from Arkadiusz Miskiewicz ar...@maven.pl 2011-12-05 13:34:10 
PST ---
Fixed at 902ee661f1864aaf8325621085f6a1b5a6a3673a

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Monday 05 December 2011 21:58:39 Daniel Vetter wrote:
 On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
  ...
 
 Thanks a lot for this excellent overview. I think at least for the first
 version of dmabuf we should drop the sync_* interfaces and simply require
 users to bracket their usage of the buffer from the attached device by
 map/unmap. A dma_buf provider is always free to cache the mapping and
 simply call sync_sg_for of the streaming dma api.

I think we still have the same problem if we allow multiple drivers
to access a noncoherent buffer using map/unmap:

driver Adriver B

1.  read/write  
2.  read/write
3.  map()   
4.  read/write
5.  dma
6.  map()
7.  dma
8.  dma
9.  unmap()
10. dma
11. read/write
12. unmap() 



In step 4, the buffer is owned by device A, but accessed by driver B, which
is a bug. In step 11, the buffer is owned by device B but accessed by driver
A, which is the same bug on the other side. In steps 7 and 8, the buffer
is owned by both device A and B, which is currently undefined but would
be ok if both devices are on the same coherency domain. Whether that point
is meaningful depends on what the devices actually do. It would be ok
if both are only reading, but not if they write into the same location
concurrently.

As I mentioned originally, the problem could be completely avoided if
we only allow consistent (e.g. uncached) mappings or buffers that
are not mapped into the kernel virtual address space at all.

Alternatively, a clearer model would be to require each access to
nonconsistent buffers to be exclusive: a map() operation would have
to block until the current mapper (if any) has done an unmap(), and
any access from the CPU would also have to call a dma_buf_ops pointer
to serialize the CPU accesses with any device accesses. User
mappings of the buffer can be easily blocked during a DMA access
by unmapping the buffer from user space at map() time and blocking the
vm_ops-fault() operation until the unmap().

 If it later turns out that we want to be able to cache the sg list also on
 the use-site in the driver (e.g. map it into some hw sg list) we can
 always add that functionality later. I just fear that sync ops among N
 devices is a bit ill-defined and we already have a plethora of ill-defined
 issues at hand. Also the proposed api doesn't quite fit into what's
 already there, I think an s/device/dma_buf_attachment/ would be more
 consistent - otherwise the dmabuf provider might need to walk the list of
 attachements to get at the right one for the device.

Right, at last for the start, let's mandate just map/unmap and not provide
sync. I do wonder however whether we should implement consistent (possibly
uncached) or streaming (cacheable, but always owned by either the device
or the CPU, not both) buffers, or who gets to make the decision which
one is used if both are implemented.

   The map call gets the dma_data_direction parameter, so it should be able
   to do the right thing. And because we keep the attachement around, any
   caching of mappings should be possible, too.
  
   Yours, Daniel
  
   PS: Slightly related, because it will make the coherency nightmare worse,
   afaict: Can we kill mmap support?
 
  The mmap support is required (and only make sense) for consistent mappings,
  not for streaming mappings. The provider must ensure that if you map
  something uncacheable into the kernel in order to provide consistency,
  any mapping into user space must also be uncacheable. A driver
  that wants to have the buffer mapped to user space as many do should
  not need to know whether it is required to do cacheable or uncacheable
  mapping because of some other driver, and it should not need to worry
  about how to set up uncached mappings in user space.
 
 Either I've always missed it or no one ever described it that consisely,
 but now I see the use-case for mmap: Simpler drivers (i.e. not gpus) might
 need to expose a userspace mapping and only the provider knows how to do
 that in a coherent fashion. I want this in the docs (if it's not there yet
 ...).

It's currently implemented in the ARM/PowerPC-specific dma_mmap_coherent
function and documented (hardly) in arch/arm/include/asm/dma-mapping.h.

We should make clear in that this is actually an extension of the
regular dma mapping api that first needs to be made generic.

 But even with that use-case in mind I still have some gripes with the
 current mmap interfaces as proposed:
 - This use-case explains why the dmabuf provider needs to expose an mmap
   function. It 

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Arnd Bergmann
On Monday 05 December 2011 14:46:47 Rob Clark wrote:
 On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann a...@arndb.de wrote:
  On Friday 02 December 2011, Sumit Semwal wrote:
  This is the first step in defining a dma buffer sharing mechanism.
 
  This looks very nice, but there are a few things I don't understand yet
  and a bunch of trivial comments I have about things I spotted.
 
  Do you have prototype exporter and consumer drivers that you can post
  for clarification?
 
 There is some dummy drivers based on an earlier version.  And airlied
 has a prime (multi-gpu) prototype:
 
 http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf
 
 I've got a nearly working camera+display prototype:
 
 https://github.com/robclark/kernel-omap4/commits/dmabuf

Ok, thanks. I think it would be good to post these for reference
in v3, with a clear indication that they are not being submitted
for discussion/inclusion yet.

  In the patch 2, you have a section about migration that mentions that
  it is possible to export a buffer that can be migrated after it
  is already mapped into one user driver. How does that work when
  the physical addresses are mapped into a consumer device already?
 
 I think you can do physical migration if you are attached, but
 probably not if you are mapped.

Ok, that's what I thought.

  You probably mean if (ret) here instead of if (!ret), right?
 
  + /* allow allocator to take care of cache ops */
  + void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
  + void (*sync_sg_for_device)(struct dma_buf *, struct device *);
 
  I don't see how this works with multiple consumers: For the streaming
  DMA mapping, there must be exactly one owner, either the device or
  the CPU. Obviously, this rule needs to be extended when you get to
  multiple devices and multiple device drivers, plus possibly user
  mappings. Simply assigning the buffer to the device from one
  driver does not block other drivers from touching the buffer, and
  assigning it to the cpu does not stop other hardware that the
  code calling sync_sg_for_cpu is not aware of.
 
  The only way to solve this that I can think of right now is to
  mandate that the mappings are all coherent (i.e. noncachable
  on noncoherent architectures like ARM). If you do that, you no
  longer need the sync_sg_for_* calls.
 
 My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
 ioctls and corresponding dmabuf ops, which userspace is required to
 call before / after CPU access.  Or just remove mmap() and do the
 mmap() via allocating device and use that device's equivalent
 DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
 would give you a way to (a) synchronize with gpu/asynchronous
 pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
 buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
 that gives you a convenient place to do cache operations on
 noncoherent architecture.

I wasn't even thinking of user mappings, as I replied to Daniel, I
think they are easy to solve (maybe not efficiently though)

 I sort of preferred having the DMABUF shim because that lets you pass
 a buffer around userspace without the receiving code knowing about a
 device specific API.  But the problem I eventually came around to: if
 your GL stack (or some other userspace component) is batching up
 commands before submission to kernel, the buffers you need to wait for
 completion might not even be submitted yet.  So from kernel
 perspective they are ready for cpu access.  Even though in fact they
 are not in a consistent state from rendering perspective.  I don't
 really know a sane way to deal with that.  Maybe the approach instead
 should be a userspace level API (in libkms/libdrm?) to provide
 abstraction for userspace access to buffers rather than dealing with
 this at the kernel level.

It would be nice if user space had no way to block out kernel drivers,
otherwise we have to be very careful to ensure that each map() operation
can be interrupted by a signal as the last resort to avoid deadlocks.

Arnd
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter dan...@ffwll.ch wrote:
 On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
 On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann a...@arndb.de wrote:
  In the patch 2, you have a section about migration that mentions that
  it is possible to export a buffer that can be migrated after it
  is already mapped into one user driver. How does that work when
  the physical addresses are mapped into a consumer device already?

 I think you can do physical migration if you are attached, but
 probably not if you are mapped.

 Yeah, that's very much how I see this, and also why map/unmap (at least
 for simple users like v4l) should only bracket actual usage. GPU memory
 managers need to be able to move around buffers while no one is using
 them.

 [snip]

  +     /* allow allocator to take care of cache ops */
  +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
  +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
 
  I don't see how this works with multiple consumers: For the streaming
  DMA mapping, there must be exactly one owner, either the device or
  the CPU. Obviously, this rule needs to be extended when you get to
  multiple devices and multiple device drivers, plus possibly user
  mappings. Simply assigning the buffer to the device from one
  driver does not block other drivers from touching the buffer, and
  assigning it to the cpu does not stop other hardware that the
  code calling sync_sg_for_cpu is not aware of.
 
  The only way to solve this that I can think of right now is to
  mandate that the mappings are all coherent (i.e. noncachable
  on noncoherent architectures like ARM). If you do that, you no
  longer need the sync_sg_for_* calls.

 My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
 ioctls and corresponding dmabuf ops, which userspace is required to
 call before / after CPU access.  Or just remove mmap() and do the
 mmap() via allocating device and use that device's equivalent
 DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
 would give you a way to (a) synchronize with gpu/asynchronous
 pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
 buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
 that gives you a convenient place to do cache operations on
 noncoherent architecture.

 I sort of preferred having the DMABUF shim because that lets you pass
 a buffer around userspace without the receiving code knowing about a
 device specific API.  But the problem I eventually came around to: if
 your GL stack (or some other userspace component) is batching up
 commands before submission to kernel, the buffers you need to wait for
 completion might not even be submitted yet.  So from kernel
 perspective they are ready for cpu access.  Even though in fact they
 are not in a consistent state from rendering perspective.  I don't
 really know a sane way to deal with that.  Maybe the approach instead
 should be a userspace level API (in libkms/libdrm?) to provide
 abstraction for userspace access to buffers rather than dealing with
 this at the kernel level.

 Well, there's a reason GL has an explicit flush and extensions for sync
 objects. It's to support such scenarios where the driver batches up gpu
 commands before actually submitting them.

Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
(Or something that I haven't thought of.)

 Also, recent gpus have all (or
 shortly will grow) multiple execution pipelines, so it's also important
 that you sync up with the right command stream. Syncing up with all of
 them is generally frowned upon for obvious reasons ;-)

Well, I guess I am happy enough with something that is at least
functional.  Usespace access would (I think) mainly be weird edge case
type stuff.  But...

 So any userspace that interacts with an OpenGL driver needs to take care
 of this anyway. But I think for simpler stuff (v4l) kernel only coherency
 should work and userspace just needs to take care of gl interactions and
 call glflush and friends at the right points. I think we can flesh this
 out precisely when we spec the dmabuf EGL extension ... (or implement one
 of the preexisting ones already around).

.. yeah, I think egl/eglImage extension would be the right place to
hide this behind.  And I guess your GL stack should be able to figure
out which execution pipeline to sync, cache state of buffer, and
whatever other optimizations you might want to make.

 On the topic of a coherency model for dmabuf, I think we need to look at
 dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
 cpu_finish or whatever they might get called) as barriers:

 So after a dma_buf_map, all previsously completed dma operations (i.e.
 unmap already called) and any cpu writes (i.e. cpu_finish called) will be
 coherent. Similar rule holds for cpu access through the userspace mmap,
 only writes completed before the cpu_start 

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 4:09 PM, Arnd Bergmann a...@arndb.de wrote:

 https://github.com/robclark/kernel-omap4/commits/dmabuf

 Ok, thanks. I think it would be good to post these for reference
 in v3, with a clear indication that they are not being submitted
 for discussion/inclusion yet.

btw, don't look at this too closely at that tree yet.. where the
attach/detach is done in videobuf2 code isn't really correct.  But I
was going to get something functioning first.

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Daniel Vetter
On Mon, Dec 05, 2011 at 04:11:46PM -0600, Rob Clark wrote:
 On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter dan...@ffwll.ch wrote:
  On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
  I sort of preferred having the DMABUF shim because that lets you pass
  a buffer around userspace without the receiving code knowing about a
  device specific API.  But the problem I eventually came around to: if
  your GL stack (or some other userspace component) is batching up
  commands before submission to kernel, the buffers you need to wait for
  completion might not even be submitted yet.  So from kernel
  perspective they are ready for cpu access.  Even though in fact they
  are not in a consistent state from rendering perspective.  I don't
  really know a sane way to deal with that.  Maybe the approach instead
  should be a userspace level API (in libkms/libdrm?) to provide
  abstraction for userspace access to buffers rather than dealing with
  this at the kernel level.
 
  Well, there's a reason GL has an explicit flush and extensions for sync
  objects. It's to support such scenarios where the driver batches up gpu
  commands before actually submitting them.

 Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
 (Or something that I haven't thought of.)

They generally all have a concept of when they've actually commited the
rendering to an X pixmap or egl image. Usually it's rather implicit, e.g.
the driver will submit any outstanding batches before returning from any
calls.
-Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-05 Thread Rob Clark
On Mon, Dec 5, 2011 at 4:09 PM, Arnd Bergmann a...@arndb.de wrote:
 On Monday 05 December 2011 14:46:47 Rob Clark wrote:
 I sort of preferred having the DMABUF shim because that lets you pass
 a buffer around userspace without the receiving code knowing about a
 device specific API.  But the problem I eventually came around to: if
 your GL stack (or some other userspace component) is batching up
 commands before submission to kernel, the buffers you need to wait for
 completion might not even be submitted yet.  So from kernel
 perspective they are ready for cpu access.  Even though in fact they
 are not in a consistent state from rendering perspective.  I don't
 really know a sane way to deal with that.  Maybe the approach instead
 should be a userspace level API (in libkms/libdrm?) to provide
 abstraction for userspace access to buffers rather than dealing with
 this at the kernel level.

 It would be nice if user space had no way to block out kernel drivers,
 otherwise we have to be very careful to ensure that each map() operation
 can be interrupted by a signal as the last resort to avoid deadlocks.

map_dma_buf should be documented to be allowed to return -EINTR..
otherwise, yeah, that would be problematic.

        Arnd
 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm: Branch 'master' - 2 commits

2011-12-05 Thread Eric Anholt
On Mon,  5 Dec 2011 02:31:58 -0800 (PST), ic...@kemper.freedesktop.org (Chris 
Wilson) wrote:
  configure.ac |2 +-
  intel/intel_bufmgr_gem.c |   27 +--
  2 files changed, 22 insertions(+), 7 deletions(-)
 
 New commits:
 commit e73161a02b604742e3da3bca8f13cff81276de43
 Author: Chris Wilson ch...@chris-wilson.co.uk
 Date:   Mon Dec 5 10:30:52 2011 +
 
 configure: Bump version to 2.4.28
 
 So that we can pull a couple of Intel bug fixes into xf86-video-intel.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

Performance before:
[  0]   glfirefox-talos-gfx   17.866   17.915   0.14%4/4
after:
[  0]   glfirefox-talos-gfx   22.173   22.251   0.20%4/4

There's a pretty obvious opportunity to keep the performance win of the
userspace caching that you've broken here, but you gave none of us a
chance to review it before you pushed the patch *and shipped a release
with it*.  This is not acceptable.  Please revert and bump the release,
and fix it correctly.


pgplK3ynqj5UA.pgp
Description: PGP signature
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d--
___
Dri-devel mailing list
dri-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Make ib allocation size function of cs size

2011-12-05 Thread j . glisse
Two following patch are on top of
http://cgit.freedesktop.org/~glisse/linux

They make the ib allocation size a function of the cs size, this
allow to avoid wasting pool space and avoid to trigger fence_wait
in ib_get. I am still evaluating how much fence_wait we avoid
with this.

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] drm/radeon: make ib size variable

2011-12-05 Thread j . glisse
From: Jerome Glisse jgli...@redhat.com

This avoid to waste ib pool size and avoid a bunch of wait for
previous ib to finish.

Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/r100.c  |2 +-
 drivers/gpu/drm/radeon/r600.c  |2 +-
 drivers/gpu/drm/radeon/r600_blit_kms.c |   16 +---
 drivers/gpu/drm/radeon/radeon.h|3 ++-
 drivers/gpu/drm/radeon/radeon_cs.c |6 --
 drivers/gpu/drm/radeon/radeon_ring.c   |7 +--
 6 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 657040b..947ba22 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -3708,7 +3708,7 @@ int r100_ib_test(struct radeon_device *rdev)
return r;
}
WREG32(scratch, 0xCAFEDEAD);
-   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, ib);
+   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, ib, 256);
if (r) {
return r;
}
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 951566f..4f08e5e 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2711,7 +2711,7 @@ int r600_ib_test(struct radeon_device *rdev, int ring)
return r;
}
WREG32(scratch, 0xCAFEDEAD);
-   r = radeon_ib_get(rdev, ring, ib);
+   r = radeon_ib_get(rdev, ring, ib, 256);
if (r) {
DRM_ERROR(radeon: failed to get ib (%d).\n, r);
return r;
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index 02a7574..d996f43 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -619,16 +619,17 @@ void r600_blit_fini(struct radeon_device *rdev)
radeon_bo_unref(rdev-r600_blit.shader_obj);
 }
 
-static int r600_vb_ib_get(struct radeon_device *rdev)
+static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size)
 {
int r;
-   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, 
rdev-r600_blit.vb_ib);
+   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX,
+ rdev-r600_blit.vb_ib, size);
if (r) {
DRM_ERROR(failed to get IB for vertex buffer\n);
return r;
}
 
-   rdev-r600_blit.vb_total = 64*1024;
+   rdev-r600_blit.vb_total = size;
rdev-r600_blit.vb_used = 0;
return 0;
 }
@@ -693,10 +694,6 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_gpu_pages)
int num_loops = 0;
int dwords_per_loop = rdev-r600_blit.ring_size_per_loop;
 
-   r = r600_vb_ib_get(rdev);
-   if (r)
-   return r;
-
/* num loops */
while (num_gpu_pages) {
num_gpu_pages -=
@@ -705,6 +702,11 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_gpu_pages)
num_loops++;
}
 
+   /* 48 bytes for vertex per loop */
+   r = r600_vb_ib_get(rdev, (num_loops*48)+256);
+   if (r)
+   return r;
+
/* calculate number of loops correctly */
ring_size = num_loops * dwords_per_loop;
ring_size += rdev-r600_blit.ring_size_common;
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6673f19..8cb6a58 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -713,7 +713,8 @@ struct r600_blit {
 
 void r600_blit_suspend(struct radeon_device *rdev);
 
-int radeon_ib_get(struct radeon_device *rdev, int ring, struct radeon_ib **ib);
+int radeon_ib_get(struct radeon_device *rdev, int ring,
+ struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index b3bbf37..fdfc31b 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -288,7 +288,8 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 * input memory (cached) and write to the IB (which can be
 * uncached).
 */
-   r =  radeon_ib_get(rdev, parser-ring, parser-ib);
+   r =  radeon_ib_get(rdev, parser-ring, parser-ib,
+  ib_chunk-length_dw * 4);
if (r) {
DRM_ERROR(Failed to get ib !\n);
return r;
@@ -348,7 +349,8 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
DRM_ERROR(cs IB too big: %d\n, ib_chunk-length_dw);
return -EINVAL;
}
-   r =  radeon_ib_get(rdev, parser-ring, parser-ib);
+   r =  radeon_ib_get(rdev, parser-ring, parser-ib,
+  ib_chunk-length_dw * 4);
  

[PATCH 2/2] drm/radeon: allocate semaphore from the ib pool

2011-12-05 Thread j . glisse
From: Jerome Glisse jgli...@redhat.com

This allow to share the ib pool with semaphore and avoid
having more bo around.

Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/radeon.h   |   67 -
 drivers/gpu/drm/radeon/radeon_device.c|2 +-
 drivers/gpu/drm/radeon/radeon_ring.c  |5 +-
 drivers/gpu/drm/radeon/radeon_semaphore.c |  157 -
 4 files changed, 131 insertions(+), 100 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8cb6a58..5e35423 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -247,32 +247,6 @@ void radeon_fence_unref(struct radeon_fence **fence);
 int radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
 
 /*
- * Semaphores.
- */
-struct radeon_ring;
-
-struct radeon_semaphore_driver {
-   rwlock_tlock;
-   struct list_headfree;
-};
-
-struct radeon_semaphore {
-   struct radeon_bo*robj;
-   struct list_headlist;
-   uint64_tgpu_addr;
-};
-
-void radeon_semaphore_driver_fini(struct radeon_device *rdev);
-int radeon_semaphore_create(struct radeon_device *rdev,
-   struct radeon_semaphore **semaphore);
-void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
- struct radeon_semaphore *semaphore);
-void radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring,
-   struct radeon_semaphore *semaphore);
-void radeon_semaphore_free(struct radeon_device *rdev,
-  struct radeon_semaphore *semaphore);
-
-/*
  * Tiling registers
  */
 struct radeon_surface_reg {
@@ -410,6 +384,46 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 uint32_t handle);
 
 /*
+ * Semaphores.
+ */
+struct radeon_ring;
+
+#defineRADEON_SEMAPHORE_BO_SIZE256
+
+struct radeon_semaphore_driver {
+   rwlock_tlock;
+   struct list_headbo;
+};
+
+struct radeon_semaphore_bo;
+
+/* everything here is constant */
+struct radeon_semaphore {
+   struct list_headlist;
+   uint64_tgpu_addr;
+   uint32_t*cpu_ptr;
+   struct radeon_semaphore_bo  *bo;
+};
+
+struct radeon_semaphore_bo {
+   struct list_headlist;
+   struct radeon_ib*ib;
+   struct list_headfree;
+   struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8];
+   unsignednused;
+};
+
+void radeon_semaphore_driver_fini(struct radeon_device *rdev);
+int radeon_semaphore_create(struct radeon_device *rdev,
+   struct radeon_semaphore **semaphore);
+void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
+ struct radeon_semaphore *semaphore);
+void radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring,
+   struct radeon_semaphore *semaphore);
+void radeon_semaphore_free(struct radeon_device *rdev,
+  struct radeon_semaphore *semaphore);
+
+/*
  * GART structures, functions  helpers
  */
 struct radeon_mc;
@@ -716,6 +730,7 @@ void r600_blit_suspend(struct radeon_device *rdev);
 int radeon_ib_get(struct radeon_device *rdev, int ring,
  struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
+bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
 void radeon_ib_pool_fini(struct radeon_device *rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 6566860..aa9a11e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -731,7 +731,7 @@ int radeon_device_init(struct radeon_device *rdev,
INIT_LIST_HEAD(rdev-gem.objects);
init_waitqueue_head(rdev-irq.vblank_queue);
init_waitqueue_head(rdev-irq.idle_queue);
-   INIT_LIST_HEAD(rdev-semaphore_drv.free);
+   INIT_LIST_HEAD(rdev-semaphore_drv.bo);
/* initialize vm here */
rdev-vm_manager.use_bitmap = 1;
rdev-vm_manager.max_pfn = 1  20;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 5f9edea..4fe320f 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -77,8 +77,7 @@ void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
 /*
  * IB.
  */
-static bool radeon_ib_try_free(struct radeon_device *rdev,
-  struct radeon_ib *ib)
+bool radeon_ib_try_free(struct 

[PATCH 0/2] omap/drm: dmm/tiler support for GEM buffers

2011-12-05 Thread Rob Clark
From: Rob Clark r...@ti.com

Support for DMM and tiled buffers.  The DMM/TILER block in omap4+ SoC
provides support for remapping physically discontiguous buffers for
various DMA initiators (DSS, IVAHD, etc) which do not otherwise support
non-physically contiguous buffers, as well as providing support for
tiled buffers.

See the descriptions in the following two patches for more details.

Andy Gross (1):
  drm/omap: DMM/TILER support for OMAP4+ platform

Rob Clark (1):
  drm/omap: add GEM support for tiled/dmm buffers

 drivers/staging/omapdrm/Makefile   |   10 +-
 drivers/staging/omapdrm/TODO   |6 +
 drivers/staging/omapdrm/omap_dmm_priv.h|  187 
 drivers/staging/omapdrm/omap_dmm_tiler.c   |  672 ++
 drivers/staging/omapdrm/omap_dmm_tiler.h   |  130 +
 drivers/staging/omapdrm/omap_drm.h |2 +-
 drivers/staging/omapdrm/omap_drv.c |   27 +-
 drivers/staging/omapdrm/omap_drv.h |3 +
 drivers/staging/omapdrm/omap_fb.c  |2 +-
 drivers/staging/omapdrm/omap_gem.c |  432 --
 drivers/staging/omapdrm/omap_gem_helpers.c |   55 +++
 drivers/staging/omapdrm/omap_priv.h|7 +-
 drivers/staging/omapdrm/tcm-sita.c |  703 
 drivers/staging/omapdrm/tcm-sita.h |   95 
 drivers/staging/omapdrm/tcm.h  |  326 +
 15 files changed, 2609 insertions(+), 48 deletions(-)
 create mode 100644 drivers/staging/omapdrm/omap_dmm_priv.h
 create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.c
 create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.h
 create mode 100644 drivers/staging/omapdrm/tcm-sita.c
 create mode 100644 drivers/staging/omapdrm/tcm-sita.h
 create mode 100644 drivers/staging/omapdrm/tcm.h

-- 
1.7.5.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] drm/omap: add GEM support for tiled/dmm buffers

2011-12-05 Thread Rob Clark
From: Rob Clark r...@ti.com

TILER/DMM provides two features for omapdrm GEM objects:
1) providing a physically contiguous view to discontiguous memory
   for hw initiators that cannot otherwise support discontiguous
   buffers (DSS scanout, IVAHD video decode/encode, etc)
2) providing untiling for 2d tiled buffers, which are used in some
   cases to provide rotation and reduce memory bandwidth for hw
   initiators that tend to access data in 2d block patterns.

For 2d tiled buffers, there are some additional complications when
it comes to userspace mmap'ings.  For non-tiled buffers, the original
(potentially physically discontiguous) pages are used to back the
mmap.  For tiled buffers, we need to mmap via the tiler/dmm region to
provide an unswizzled view of the buffer.  But (a) the buffer is not
necessarily pinned in TILER all the time (it can be unmapped when
there is no DMA access to the buffer), and (b) when they are they
are pinned, they not necessarily page aligned from the perspective of
the CPU.  And non-page aligned userspace buffer mapping is evil.

To solve this, we reserve one or more small regions in each of the 2d
containers when the driver is loaded to use as a user-GART where we
can create a second page-aligned mapping of parts of the buffer being
accessed from userspace.  Page faulting is used to evict and remap
different regions of whichever buffers are being accessed from user-
space.

Signed-off-by: Rob Clark r...@ti.com
---
 drivers/staging/omapdrm/TODO   |5 +
 drivers/staging/omapdrm/omap_drv.c |6 +-
 drivers/staging/omapdrm/omap_drv.h |3 +
 drivers/staging/omapdrm/omap_fb.c  |2 +-
 drivers/staging/omapdrm/omap_gem.c |  432 +---
 drivers/staging/omapdrm/omap_gem_helpers.c |   55 
 6 files changed, 466 insertions(+), 37 deletions(-)

diff --git a/drivers/staging/omapdrm/TODO b/drivers/staging/omapdrm/TODO
index 18677e7..55b1837 100644
--- a/drivers/staging/omapdrm/TODO
+++ b/drivers/staging/omapdrm/TODO
@@ -22,6 +22,11 @@ TODO
 . Review DSS vs KMS mismatches.  The omap_dss_device is sort of part encoder,
   part connector.  Which results in a bit of duct tape to fwd calls from
   encoder to connector.  Possibly this could be done a bit better.
+. Solve PM sequencing on resume.  DMM/TILER must be reloaded before any
+  access is made from any component in the system.  Which means on suspend
+  CRTC's should be disabled, and on resume the LUT should be reprogrammed
+  before CRTC's are re-enabled, to prevent DSS from trying to DMA from a
+  buffer mapped in DMM/TILER before LUT is reloaded.
 . Add debugfs information for DMM/TILER
 
 Userspace:
diff --git a/drivers/staging/omapdrm/omap_drv.c 
b/drivers/staging/omapdrm/omap_drv.c
index 71de7cf..7ecf578 100644
--- a/drivers/staging/omapdrm/omap_drv.c
+++ b/drivers/staging/omapdrm/omap_drv.c
@@ -509,7 +509,7 @@ static int ioctl_gem_info(struct drm_device *dev, void 
*data,
return -ENOENT;
}
 
-   args-size = obj-size;  /* for now */
+   args-size = omap_gem_mmap_size(obj);
args-offset = omap_gem_mmap_offset(obj);
 
drm_gem_object_unreference_unlocked(obj);
@@ -557,6 +557,8 @@ static int dev_load(struct drm_device *dev, unsigned long 
flags)
 
dev-dev_private = priv;
 
+   omap_gem_init(dev);
+
ret = omap_modeset_init(dev);
if (ret) {
dev_err(dev-dev, omap_modeset_init failed: ret=%d\n, ret);
@@ -589,8 +591,8 @@ static int dev_unload(struct drm_device *dev)
drm_kms_helper_poll_fini(dev);
 
omap_fbdev_free(dev);
-
omap_modeset_free(dev);
+   omap_gem_deinit(dev);
 
kfree(dev-dev_private);
dev-dev_private = NULL;
diff --git a/drivers/staging/omapdrm/omap_drv.h 
b/drivers/staging/omapdrm/omap_drv.h
index c8f2752..9d0783d 100644
--- a/drivers/staging/omapdrm/omap_drv.h
+++ b/drivers/staging/omapdrm/omap_drv.h
@@ -84,6 +84,8 @@ struct drm_connector *omap_framebuffer_get_next_connector(
 void omap_framebuffer_flush(struct drm_framebuffer *fb,
int x, int y, int w, int h);
 
+void omap_gem_init(struct drm_device *dev);
+void omap_gem_deinit(struct drm_device *dev);
 
 struct drm_gem_object *omap_gem_new(struct drm_device *dev,
union omap_gem_size gsize, uint32_t flags);
@@ -109,6 +111,7 @@ int omap_gem_get_paddr(struct drm_gem_object *obj,
dma_addr_t *paddr, bool remap);
 int omap_gem_put_paddr(struct drm_gem_object *obj);
 uint64_t omap_gem_mmap_offset(struct drm_gem_object *obj);
+size_t omap_gem_mmap_size(struct drm_gem_object *obj);
 
 static inline int align_pitch(int pitch, int width, int bpp)
 {
diff --git a/drivers/staging/omapdrm/omap_fb.c 
b/drivers/staging/omapdrm/omap_fb.c
index 82ed612..491be53 100644
--- a/drivers/staging/omapdrm/omap_fb.c
+++ b/drivers/staging/omapdrm/omap_fb.c
@@ -102,7 +102,7 @@ int omap_framebuffer_get_buffer(struct drm_framebuffer *fb, 
int x,