Re: [Dri-devel] r128 strangeness

2002-06-18 Thread Keith Whitwell

Eric Anholt wrote:
> I was working on os-independence, starting with the r128 driver because
> I have a linux machine ready with a r128 in it.  It's a gentoo 1.1
> system (2.4.18 vanilla),  Rage128 Mobility M4 on Inspiron 8000 (i815),
> and 4.2.0 was installed with no DRM.
> 
> I made World install with bsd-3-0-0-branch, and compiled r128.o from
> .../linux/drm/kernel.  Loaded the module, restarted X, direct rendering
> was enabled but the graphics are garbled.  Screen clearing isn't working
> and windows are missing graphics.  Upon logging in (which started
> gnome), the system crashed (alt-sysrq to reboot).  After rebooting,
> starting X again, and going to console, the XFree86.0.log is full of:
> 
> (EE) R128(0): Idle timed out, resetting engine...
> 
> bsd-3-0-0-branch has the same r128 ddx driver and kernel module code as
> trunk according to cvs diff -u -rHEAD.  The X Server, ddx, drm, and dri
> modules got installed, so I don't think it's a versioning issue.
> 
> Has anyone else seen this?
> 
> 

I don't know how many people are using r128's.

Keith



   Bringing you mounds of caffeinated joy
   >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Keith Whitwell

Jens Owen wrote:
> Linus Torvalds wrote:
> 
> 
>>Yes, kernel support (or indirect rendering) is needed for untrusted
>>applications, but it might actually be interesting to see what a
>>direct-rendering all-user-land implementation looks like. It has some
>>debugging advantages, and it may actually make sense to start from a
>>totally trusted app that goes as fast as humanly possible, and then when
>>that has been optimized to death look at just where the interfaces make
>>the most sense..
>>
> 
> Keith,
> 
> Along these lines, I've been toying around with the idea that direct
> user level access to the ring for commands *might* be able to use a DRM
> locking policy similar to how we protected the ring in the TDFX driver
> where it was directly accessed by the user space driver and indirect
> buffers were only used for indirect kinds of data like arrays and
> textures.

I don't understand why you see the ring as being so special.  The hardware 
provides indirect buffers just so you don't have to have multiple clients 
contending for the ring.  I don't see any disadvantages to using that; the big 
advantage is you don't have to grab the lock each time you want to emit a few 
bytes to an indirect buffer (you would with the ring, and the fast path on the 
lock isn't *that* fast).

Arrays & textures have a third level of indirection provided by the hardware.

> We touched on this a few weeks ago on IRC, and IIRC you thought there
> might be some problems with coordinating access and aging buffers. 
> Would it be valuable if I were able to get a prototype going where the
> 2D server access and the kernel DRM module shared direct access to the
> ring protected by the DRM lock?  Obviously, multiple instances of a 3D
> driver would be an even better prototype, but I'm looking for an easier
> proof of concept:-)

This is a reasonable thing to do anyway as the 2d driver currently holds the 
lock over its entire operation (perhaps delayed until the first accel action), 
and is already 'trusted', so probably *should* use the ring directly rather 
than going through the ioctl overhead for accel, which typically comes to only 
a few 10's to 100's of bytes/ioctl.

> If I can get 2D and kernel sharing access, what problems do you foresee
> with getting multiple 3D clients to participate using a policy similar
> to TDFX?

The i810 actually works exactly the way you're describing.

Keith



   Bringing you mounds of caffeinated joy
   >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] r128 strangeness

2002-06-18 Thread Eric Anholt

I was working on os-independence, starting with the r128 driver because
I have a linux machine ready with a r128 in it.  It's a gentoo 1.1
system (2.4.18 vanilla),  Rage128 Mobility M4 on Inspiron 8000 (i815),
and 4.2.0 was installed with no DRM.

I made World install with bsd-3-0-0-branch, and compiled r128.o from
.../linux/drm/kernel.  Loaded the module, restarted X, direct rendering
was enabled but the graphics are garbled.  Screen clearing isn't working
and windows are missing graphics.  Upon logging in (which started
gnome), the system crashed (alt-sysrq to reboot).  After rebooting,
starting X again, and going to console, the XFree86.0.log is full of:

(EE) R128(0): Idle timed out, resetting engine...

bsd-3-0-0-branch has the same r128 ddx driver and kernel module code as
trunk according to cvs diff -u -rHEAD.  The X Server, ddx, drm, and dri
modules got installed, so I don't think it's a versioning issue.

Has anyone else seen this?

-- 
Eric Anholt <[EMAIL PROTECTED]>
http://gladstone.uoregon.edu/~eanholt/dri/




   Bringing you mounds of caffeinated joy
   >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Jens Owen

Jeff Hartmann wrote:
> 
> Keith Whitwell wrote:
> 
> > Benjamin Herrenschmidt wrote:
> >
> >>> HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok.
> >>> That gives you the required system exclusion, and if you make it an
> >>> explicit "get my GART context" function that is only called under
> >>> the DRM
> >>> lock _and_ only called when you actually need the AGP access, you also
> >>> avoid the unnecessary context switches.
> >>>
> >>> You might still have some performance issues simply because you
> >>> would do
> >>> extra work when switching aperture mappings, but hopefully the GART
> >>> switch
> >>> wouldn't be a common operation.
> >>>
> >>> The flexibility you would get _might_ be worth it.
> >>>
> >>
> >> Well, I would personally vote for the processes _not_ relying on having
> >> the AGP aperture mapped directly, but instead, the various memory pages
> >> making their AGP aperture. Several chipsets (Apple ones for sure, but it
> >> seems others are hitting this too nowadays) don't support AGP aperture
> >> accesses from the CPU.
> >
> >
> > What are you actually saying, that pages mapped in agp can't be
> > written by any means, or just that they can't be written through the
> > agp address range?
> >
> > It sounds kindof broken to me in any case.  How to mtrrs work in this
> > world?
> 
> Actually we should go to this model eventually.  However it needs me to
> have time to finish the Page Attribute Table support I started on at
> VA.  This allows write combining to be set on a per page basis, and is
> the direction we want to go even on x86.
> 
> >
> >
> >> That way, if you want several AGP contexts, you can have the processes
> >> tapping their AGP buffers without lock, locking would only be required
> >> once it's time to move one of these buffers in/out the physical GART
> >> under the arbitration of the DRM.
> >
> >
> > You don't need to lock to write to agp buffers in the current scheme.
> >
> > You also don't need to play with the gart table just to draw a
> > 2-triangle  strip.  On some chipsets, particularly under smp,
> > modifying the gart table is  very slow.  Ask Jeff about this.
> >
> > Keith
> >
>This is also true, but I've done alot of heavy think on this very
> issue.  The key is to manage the agp aperture and only swap out regions
> when you absolutely have too.  The big key to getting something like
> this to work is a memory manager that every client uses, and is based on
> some sort of sarea.  It should be designed with a certain minimum block
> size, and have a few different flags for what kind of usage that memory
> block has.  (I can go into more detail on design, but you probably have
> a good idea what I mean here.)  Then the next step is to create kernel
> calls which can swap things to an from agp space and the card.  One
> cards that support it, another path (which prevents GART rewrites
> entirely) is to add support to swap to normal cached memory.
>This is what I envision making sense in the long run.  A global
> memory manager using an sarea (doesn't have to be the main one) and a
> good aging mechanism get us most of the way there.

Jeff,

It might be helpful to clarify the different uses we are discussing WRT
to AGP.  In this thread so far, we've been jumping all over.  Here's a
shot at an AGP breakdown.  Feel free to correct my misconceptions.

1) The original utilization of AGP under Linux is faster MMIO
transactions than PCI.  Some level of improvement happens here by simply
accessing a device on an AGP bus, and no special AGP programming is
required.

2) Simple MMIO transactions can be optimized by enabling fast writes. 
This case is identical to the MMIO transactions in the first case, but
the bus and graphics chipset utilize hardware pipelining to increase
thrueput.  There is a penalty for turning the bus around
write/read/write/read because of the pipelining.  There are also certain
combinations of host chipsets and graphics chips where enabling fast
writes can cause hangs.

The remaining cases all utilize AGP bus mastering where the graphics
chip can read and write directly from AGP memory.
 
3) Static AGP Allocation.  This is the primary functionality that the
agpgart module provides today.  Physical memory is allocated by agpgart
as needed and that memory is managed on behalf of the user space and DRM
drivers at run time.  There is a finite amount of this memory available
dictated by the size of the AGP apperature (typically 64M).  We have not
fully exploited this case in user space, yet.  The prototype for the AGP
allocator and transfer mechanism of glDrawPixels in the Matrox G400
driver is a good example of the potential here.

4) Dynamic AGP Binding.  This functionality is spec'ed in the agpgart
interface but is not fully implemented, yet.  The intention is for user
space processes to be able to bind normal virtual pages to the AGP
apperature in a very dynamic fashion.  Some of the discussions about
binding and unbindin

Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Jens Owen

Linus Torvalds wrote:

> Yes, kernel support (or indirect rendering) is needed for untrusted
> applications, but it might actually be interesting to see what a
> direct-rendering all-user-land implementation looks like. It has some
> debugging advantages, and it may actually make sense to start from a
> totally trusted app that goes as fast as humanly possible, and then when
> that has been optimized to death look at just where the interfaces make
> the most sense..

Keith,

Along these lines, I've been toying around with the idea that direct
user level access to the ring for commands *might* be able to use a DRM
locking policy similar to how we protected the ring in the TDFX driver
where it was directly accessed by the user space driver and indirect
buffers were only used for indirect kinds of data like arrays and
textures.

We touched on this a few weeks ago on IRC, and IIRC you thought there
might be some problems with coordinating access and aging buffers. 
Would it be valuable if I were able to get a prototype going where the
2D server access and the kernel DRM module shared direct access to the
ring protected by the DRM lock?  Obviously, multiple instances of a 3D
driver would be an even better prototype, but I'm looking for an easier
proof of concept:-)

If I can get 2D and kernel sharing access, what problems do you foresee
with getting multiple 3D clients to participate using a policy similar
to TDFX?

Regards,
Jens

-- /\
 Jens Owen/  \/\ _
  [EMAIL PROTECTED]  /\ \ \   Steamboat Springs, Colorado


   Bringing you mounds of caffeinated joy
   >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] radeon driver on current trunk

2002-06-18 Thread Michel Dänzer

On Tue, 2002-06-18 at 18:57, Michael wrote: 
> On Mon, Jun 17, 2002 at 02:26:49AM +0200, Michel Dänzer wrote:
> > - Portability fixes for the new driver:
> > http://penguinppc.org/~daenzer/DRI/radeon-endianness.diff
> > Feedback and testing appreciated as always, in particular on the changes
> > to the x86 specific parts.
> 
> I've been running with this for a few hours now, seems fine.

Great, I've committed them. Thanks for testing.


-- 
Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer
XFree86 and DRI project member   /  CS student, Free Software enthusiast


   Bringing you mounds of caffeinated joy
   >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Benjamin Herrenschmidt

>
>
>On Mon, 17 Jun 2002, Benjamin Herrenschmidt wrote:
>>
>> >mmap() and AGP driver gives access to IOMEM/AGP
>>
>> That one is problematic. I don't support the mmap interface properly
>> on Apple chipsets for example, because they don't support the AGP
>> aperture beeing accessed by the CPU.
>
>I assume you mean that the CPU doesn't honour the AGP mappings, but the
>CPU _can_ access the physical pages themselves.

Of course, it would be pretty useless if the aperture couldn't be used
at all ;) Sorry for the confusion.

>How do you do it right
>now, since we seem to be doing "ioremap_nocache()" all over the place with
>the AGP aperture?

Not that much. I really only bothered with the r128 and radeon drivers,
the kernel side use a few well localized ioremap calls that I turned
into an agp_ioremap call. I then implement that function by directly
building a virtual mapping from the underlying AGP pages.
The userland side is using drmMap, which already sets up some vmops
for various cases, I just had to add a specific case for this kind of
AGP hosts that give the real RAM pages on the fly.

>But fundamentally that should not be a problem: we can map the (unmapped)
>AGP pages one page at a time (rather than as one contiguous block of
>remapped pages) into user mode.

Using vmops is easy, provided that nobody plays tricks like binding/unbinding
memory from the GART behind out back. That is my main problem with the
current ioctl interface to agpgart. Basically, the API allows this and
the agptest program itself will, for example, map the aperture before
binding memory to it. It's still workable provided the aperture isn't
accessed before memory is bound, but if we allow dynamic binding/unbinding
of memory while the entire aperture is mmap'ed in some process space,
then we have to potentially tear down mappings of those other processes
on unbind, which is beyond my knowledge of linux vm (especially on SMP).

>I thought AGP already supported a mmap() interface, and if it really
>doesn't, it should be trivial to do...
>
> [ Time passes, Linus looks at the sources ]
>
>Ok, there does seem to be mmap() support in the AGP module, but it seems
>to use that stupid "remap_page_range()" and the AGP base (similar to
>ioremap() inside the kernel), so it does seem to mmap the _mapped_ AGP
>area.
>
>It would be possible to just install a "nopage" handler, and map one page
>at a time on demand from the pool of (non-GART-mapped) pages that we keep
>in the gatt_table[] or whatever.

Yup, exactly like what I do with drmMap, but I still don't like the API
for the reason I just explained.

>Maybe there is some reason for doing it that way that I don't understand.
>More likely, it's just done that way because it was the simple and stupid
>approach.
>
>However, you seem to prefer a different approach, which would certainly
>work:
>
>> I would much prefer the agpgart interface to be redisigned around
>> different semantics, mostly vmalloc() some space to use as AGP memory,
>> then bind that to the GART, but don't rely on direct AGP aperture
>> access.
>>
>> There are also some slight speed improvements to win using this
>> sheme as I could map the AGP memory as cacheable (which would give a
>> significant boost on PPC) provided buffers & ring get properly flushed
>> before beeing "passed" to the chip.
>
>Hmm.. It would be fairly simple to do all page allocation in user space,
>and have an interface that says "put the physical page corresponding to my
>virtual address  into the AGP aperture at offset ".
>
>This would effectively disallow the above "map by unmapped page" approach,
>because it's too damn expensive to find and flush any existing mappings
>when somebody maps in a new page. And if not all systems support the
>GART-assisted CPU mapping that we do now, that means that nobody can mmap
>the AGP area into memory.

Exactly.

>The expensive part would be the "mark this page uncacheable" when moving
>it to the AGP buffer, which implies a cross-CPU TLB flush for each such
>page. So moving a page into the AGP aperture is fundamentally a fairly
>expensive operation: wbinvd itself takes a _loong_ time, but if you have
>to do it on all CPU's along with the TLB flush, it gets _really_
>expensive.
>
>So moving pages that way is definitely not cheap either. Hmm.

What about simpler semantics ? A given client need well known chunks of
AGP memory (the ring buffer, the indirect buffers, etc...). All we really
need is _one_ call to allocate a chunk of memory and bind it to the GART.
That call would return whatever opaque ID that can be used for processes
to later mmap that into their space and the offset into AGP aperture where
it was bound.
Of course, we need to provide the opposite call for disposing of it, but in
this case, we probably don't need to be smart regarding processes who still
have it mapped as it's typically a fatal thing or programming error.

This avoids the problem with current agpgart which is to allow more or
l

Re: [Dri-devel] Newbie to DRI development

2002-06-18 Thread trhosiaw

I think their visualizations interests cover a wide range of fields such as
fluids, astrophysics, molecular chemistry, etc.

They are primarly interested and want everything being open-source if possible. 

I see that your also from HP, actually another person from HP is supposed to get 
back to use about possible video cards to run on the donated Compaq ES45 server
that I will be using

Tom

> Tom,
>   Just out of curiosity, what type of visualization applications
> are you going to be running?  (Are any of them open-source?)
> 
> Thanks,
> --Phil
> 
> Hewlett-Packard: High Performance Technical
> Computing/Visualization
> ---
> [EMAIL PROTECTED]   
> Performance/Development
> 
> 
> On Tue, 18 Jun 2002 [EMAIL PROTECTED] wrote:
> 
> > No, its the University of Western Ontario
> >
> > Its not a class assignment, I'm working for SHARCNet (see this for
> more info
> > http://www.sharcnet.ca/org_corner/) at UWO. I was hired for the summer
> by one of
> > the professors in charge of SHARCNet to get video drivers working and
> optimized
> > on Compaq ES45 servers for scientific visualizations
> >
> > I've started looking at the radeon drivers, and they seem to be
> better
> > documented and commented so hopefully that will help
> >
> > Tom
> >
> > > [EMAIL PROTECTED] wrote:
> > > >
> > > > Hi, I'm working at a University where they would like me to
> start
> > > the
> > > > development of video drivers for a ATI Radeon or 3Dlabs card to
> run on
> > > Alpha
> > > > Linux(Red Hat).
> > >
> > > I see uwo.ca in your e-mail address.  Is the University of
> Waterloo,
> > > Onterio?  Are you the only student working on this, or is this a
> class
> > > assignment?
> > >
> > > Just curious.
> > >
> > > -- /\
> > >  Jens Owen/  \/\ _
> > >   [EMAIL PROTECTED]  /\ \ \   Steamboat Springs,
> Colorado
> > >
> >
> >
> 
> >Bringing you mounds of caffeinated joy
> >   >>> http://thinkgeek.com/sf<<<
> >
> > ___
> > Dri-devel mailing list
> > [EMAIL PROTECTED]
> > https://lists.sourceforge.net/lists/listinfo/dri-devel
> >
> 


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Newbie to DRI development

2002-06-18 Thread trhosiaw

No, its the University of Western Ontario

Its not a class assignment, I'm working for SHARCNet (see this for more info
http://www.sharcnet.ca/org_corner/) at UWO. I was hired for the summer by one
of
the professors in charge of SHARCNet to get video drivers working and optimized
on Compaq ES45 servers for scientific visualizations

I've started looking at the radeon drivers, and they seem to be better
documented and commented so hopefully that will help

Tom

> [EMAIL PROTECTED] wrote:
>
> Hi, I'm working at a University where they would like me to start
> the
> > development of video drivers for a ATI Radeon or 3Dlabs card to run on
> Alpha
> > Linux(Red Hat).
>
> I see uwo.ca in your e-mail address.  Is the University of Waterloo,
> Onterio?  Are you the only student working on this, or is this a class
> assignment?
>
> Just curious.
>
> -- /\
>  Jens Owen/  \/\ _
>   [EMAIL PROTECTED]  /\ \ \   Steamboat Springs, Colorado
>


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Newbie to DRI development

2002-06-18 Thread trhosiaw

No, its the University of Western Ontario

Its not a class assignment, I'm working for SHARCNet (see this for more info
http://www.sharcnet.ca/org_corner/) at UWO. I was hired for the summer by one of
the professors in charge of SHARCNet to get video drivers working and optimized
on Compaq ES45 servers for scientific visualizations

I've started looking at the radeon drivers, and they seem to be better
documented and commented so hopefully that will help

Tom

> [EMAIL PROTECTED] wrote:
> > 
> > Hi, I'm working at a University where they would like me to start
> the
> > development of video drivers for a ATI Radeon or 3Dlabs card to run on
> Alpha
> > Linux(Red Hat).
> 
> I see uwo.ca in your e-mail address.  Is the University of Waterloo,
> Onterio?  Are you the only student working on this, or is this a class
> assignment?
> 
> Just curious.
> 
> -- /\
>  Jens Owen/  \/\ _
>   [EMAIL PROTECTED]  /\ \ \   Steamboat Springs, Colorado
> 


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] gcc 3.1: finally OK but VT still problematic

2002-06-18 Thread Sergey V. Udaltsov

Hi all

Finally, I've got my kernel built with gcc 3.1 (actually, my problems
were in some mystical gcc296 in some compat package). And - wow - mach64
0-0-4 branch works for me! Great thanks to everyone. Even 2D seems to be
OK these days. The only problem I noticed is VT switching. When I switch
to the first VT, my X crashes. What could this be? Any way to track?

Great thanks to all of you folks.

Sergey





   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] radeon driver on current trunk

2002-06-18 Thread Michael

On Mon, Jun 17, 2002 at 02:26:49AM +0200, Michel Dänzer wrote:
> - Portability fixes for the new driver:
> http://penguinppc.org/~daenzer/DRI/radeon-endianness.diff
> Feedback and testing appreciated as always, in particular on the changes
> to the x86 specific parts.

I've been running with this for a few hours now, seems fine.

-- 
Michael.


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Linus Torvalds



On Tue, 18 Jun 2002, Linus Torvalds wrote:
>
> So moving pages that way is definitely not cheap either. Hmm.

In fact, considering the cache and multi-CPU overhead, it's likely to be
faster to just memcpy() the damn thing from a regular cached mapping to an
existing AGP-mapped page.  Which is pretty much what we do right now in
kernel space.

Playing VM games tends to be slow for normal mappings thanks to TLB
effects, and playing VM games with AGP stuff is taking that slowness to a
new level. In addition to the TLB effects you now have cache effects
and GART mapping updates. And cache effects are much worse than TLB
effects ever were, simply because caches are a damn lot bigger (not to
mention the fact that the x86 has very limited cache control).

So when it comes to mmap, I think you should either just map the AGP pages
uncached into user space in the first place (so that you don't have any
cache coherency problems at run-time) or you're better off doing the
existing memcpy(). Moving pages is too painful.

Linus



   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Newbie to DRI development

2002-06-18 Thread Jens Owen

[EMAIL PROTECTED] wrote:
> 
> Hi, I'm working at a University where they would like me to start the
> development of video drivers for a ATI Radeon or 3Dlabs card to run on Alpha
> Linux(Red Hat).

I see uwo.ca in your e-mail address.  Is the University of Waterloo,
Onterio?  Are you the only student working on this, or is this a class
assignment?

Just curious.

-- /\
 Jens Owen/  \/\ _
  [EMAIL PROTECTED]  /\ \ \   Steamboat Springs, Colorado


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Linus Torvalds



On Mon, 17 Jun 2002, Benjamin Herrenschmidt wrote:
>
> > mmap() and AGP driver gives access to IOMEM/AGP
>
> That one is problematic. I don't support the mmap interface properly
> on Apple chipsets for example, because they don't support the AGP
> aperture beeing accessed by the CPU.

I assume you mean that the CPU doesn't honour the AGP mappings, but the
CPU _can_ access the physical pages themselves. How do you do it right
now, since we seem to be doing "ioremap_nocache()" all over the place with
the AGP aperture?

But fundamentally that should not be a problem: we can map the (unmapped)
AGP pages one page at a time (rather than as one contiguous block of
remapped pages) into user mode.

I thought AGP already supported a mmap() interface, and if it really
doesn't, it should be trivial to do...

 [ Time passes, Linus looks at the sources ]

Ok, there does seem to be mmap() support in the AGP module, but it seems
to use that stupid "remap_page_range()" and the AGP base (similar to
ioremap() inside the kernel), so it does seem to mmap the _mapped_ AGP
area.

It would be possible to just install a "nopage" handler, and map one page
at a time on demand from the pool of (non-GART-mapped) pages that we keep
in the gatt_table[] or whatever.

Maybe there is some reason for doing it that way that I don't understand.
More likely, it's just done that way because it was the simple and stupid
approach.

However, you seem to prefer a different approach, which would certainly
work:

> I would much prefer the agpgart interface to be redisigned around
> different semantics, mostly vmalloc() some space to use as AGP memory,
> then bind that to the GART, but don't rely on direct AGP aperture
> access.
>
> There are also some slight speed improvements to win using this
> sheme as I could map the AGP memory as cacheable (which would give a
> significant boost on PPC) provided buffers & ring get properly flushed
> before beeing "passed" to the chip.

Hmm.. It would be fairly simple to do all page allocation in user space,
and have an interface that says "put the physical page corresponding to my
virtual address  into the AGP aperture at offset ".

This would effectively disallow the above "map by unmapped page" approach,
because it's too damn expensive to find and flush any existing mappings
when somebody maps in a new page. And if not all systems support the
GART-assisted CPU mapping that we do now, that means that nobody can mmap
the AGP area into memory.

The expensive part would be the "mark this page uncacheable" when moving
it to the AGP buffer, which implies a cross-CPU TLB flush for each such
page. So moving a page into the AGP aperture is fundamentally a fairly
expensive operation: wbinvd itself takes a _loong_ time, but if you have
to do it on all CPU's along with the TLB flush, it gets _really_
expensive.

So moving pages that way is definitely not cheap either. Hmm.

Linus



   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Newbie to DRI development

2002-06-18 Thread Michel Dänzer

On Tue, 2002-06-18 at 17:31, [EMAIL PROTECTED] wrote:
> Hi, I'm working at a University where they would like me to start the
> development of video drivers for a ATI Radeon or 3Dlabs card to run on Alpha
> Linux(Red Hat).

I'd expect the radeon driver to work on alpha, have you tried it?

> I've read all the documentation on your website, and some more general
> information in other places but I'm having a problem seeing where to start.  The
> documentation on the website is more of an overview(big picture) of DRI, rather
> than details of what needs to be implemented and how to implement them in the
> drivers and so I'm completely stuck.
> 
> I've taken a look at some of the code, specifcally for the i810 card as is
> suggested for beginners but its hard to make much sense out of it when
> commenting and documentation of code is very sparse. How am I supposed to know
> what functions need to be implemented or even their prototypes.
> 
> Any help would be of great assistance, as right now I'm just completely lost

In case it doesn't work yet, fixing it (for powerpc) has been a good way
for me to get familiar with it.


-- 
Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer
XFree86 and DRI project member   /  CS student, Free Software enthusiast


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] Newbie to DRI development

2002-06-18 Thread trhosiaw

Hi, I'm working at a University where they would like me to start the
development of video drivers for a ATI Radeon or 3Dlabs card to run on Alpha
Linux(Red Hat).

I've read all the documentation on your website, and some more general
information in other places but I'm having a problem seeing where to start.  The
documentation on the website is more of an overview(big picture) of DRI, rather
than details of what needs to be implemented and how to implement them in the
drivers and so I'm completely stuck.

I've taken a look at some of the code, specifcally for the i810 card as is
suggested for beginners but its hard to make much sense out of it when
commenting and documentation of code is very sparse. How am I supposed to know
what functions need to be implemented or even their prototypes.

Any help would be of great assistance, as right now I'm just completely lost

Thanks
  Tom


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] Re: Dri-devel digest, Vol 1 #1470 - 11 msgs

2002-06-18 Thread Brian S. Julin



On Tue, 18 Jun 2002 [EMAIL PROTECTED] wrote:
> From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> To: Linus Torvalds <[EMAIL PROTECTED]>,
> 
> > - Interrupts
> >
> > You don't use these right now, and as far as I can tell the main
> > reason for using them would be to just synchronize page flipping
> > with the framerate. No?
> 
> Which would be nice to have proper frame-sync on interlaced display
> (especially with Michel Danzer work on using DRM for Xv blits).

My two cents -- add this to the fbdev driver in a way that allows 
DRM to use it from userspace.  Matroxfb already has an IOCTL set,
which could be expanded to allow running an automatic page-flipper,
and perhaps have a hook on which to hang any handlers for DRM 
drivers that for some reason really do need a kernel module.

Simple apps that don't use X or much 3d (directfb, GGI, SDL) 
would really benefit from a unified pageflip/retrace API for fbdev, 
and if that allows DRI development to take place in userspace 
(and in so doing speed up development) so much the better.

--
Brian S. Julin



   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Jeff Hartmann

Keith Whitwell wrote:

> Benjamin Herrenschmidt wrote:
> 
>>> HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok.
>>> That gives you the required system exclusion, and if you make it an
>>> explicit "get my GART context" function that is only called under 
>>> the DRM
>>> lock _and_ only called when you actually need the AGP access, you also
>>> avoid the unnecessary context switches.
>>> 
>>> You might still have some performance issues simply because you 
>>> would do
>>> extra work when switching aperture mappings, but hopefully the GART 
>>> switch
>>> wouldn't be a common operation.
>>> 
>>> The flexibility you would get _might_ be worth it.
>>> 
>> 
>> Well, I would personally vote for the processes _not_ relying on having
>> the AGP aperture mapped directly, but instead, the various memory pages
>> making their AGP aperture. Several chipsets (Apple ones for sure, but it
>> seems others are hitting this too nowadays) don't support AGP aperture
>> accesses from the CPU.
> 
> 
> What are you actually saying, that pages mapped in agp can't be 
> written by any means, or just that they can't be written through the 
> agp address range?
> 
> It sounds kindof broken to me in any case.  How to mtrrs work in this 
> world? 

Actually we should go to this model eventually.  However it needs me to 
have time to finish the Page Attribute Table support I started on at 
VA.  This allows write combining to be set on a per page basis, and is 
the direction we want to go even on x86.

> 
> 
>> That way, if you want several AGP contexts, you can have the processes
>> tapping their AGP buffers without lock, locking would only be required
>> once it's time to move one of these buffers in/out the physical GART
>> under the arbitration of the DRM.
> 
> 
> You don't need to lock to write to agp buffers in the current scheme.
> 
> You also don't need to play with the gart table just to draw a 
> 2-triangle  strip.  On some chipsets, particularly under smp, 
> modifying the gart table is  very slow.  Ask Jeff about this.
> 
> Keith
> 
   This is also true, but I've done alot of heavy think on this very 
issue.  The key is to manage the agp aperture and only swap out regions 
when you absolutely have too.  The big key to getting something like 
this to work is a memory manager that every client uses, and is based on 
some sort of sarea.  It should be designed with a certain minimum block 
size, and have a few different flags for what kind of usage that memory 
block has.  (I can go into more detail on design, but you probably have 
a good idea what I mean here.)  Then the next step is to create kernel 
calls which can swap things to an from agp space and the card.  One 
cards that support it, another path (which prevents GART rewrites 
entirely) is to add support to swap to normal cached memory.
   This is what I envision making sense in the long run.  A global 
memory manager using an sarea (doesn't have to be the main one) and a 
good aging mechanism get us most of the way there.

-Jeff



   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] DRI-DEVEL, Get Perfect 'Bank Rated' Credit Status!

2002-06-18 Thread DRI-DEVEL, Get Perfect 'Bank Rated' Credit Status!





  
  

  Quickly and 
  Easily
  Improve YOUR Credit to
  PERFECT 'Bank' Rated
  Credit Status!
  

  Click here now for FULL FREE details!
  

   

  
  


© 2002 All rights reserved.
Unsubscribe




   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Benjamin Herrenschmidt

>What are you actually saying, that pages mapped in agp can't be written
>by any 
>means, or just that they can't be written through the agp address range?

Through the AGP address range. I work around this by hacking the DRM to
map the RAM pages directly in drmMap using specific vmops and a hacked
agp_ioremap. AFAIK, ia64 has similar limitations and I've been told
other recent bridges used on non-x86 at least share this design "mistake"

>It sounds kindof broken to me in any case.  How to mtrrs work in this world?

They don't exist ;)

>> That way, if you want several AGP contexts, you can have the processes
>> tapping their AGP buffers without lock, locking would only be required
>> once it's time to move one of these buffers in/out the physical GART
>> under the arbitration of the DRM.
>
>You don't need to lock to write to agp buffers in the current scheme.
>
>You also don't need to play with the gart table just to draw a 2-triangle 
>strip.  On some chipsets, particularly under smp, modifying the gart
table is 
>very slow.  Ask Jeff about this.

Ok.

Ben.



   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Keith Whitwell

Benjamin Herrenschmidt wrote:
>>HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok.
>>That gives you the required system exclusion, and if you make it an
>>explicit "get my GART context" function that is only called under the DRM
>>lock _and_ only called when you actually need the AGP access, you also
>>avoid the unnecessary context switches.
>>
>>You might still have some performance issues simply because you would do
>>extra work when switching aperture mappings, but hopefully the GART switch
>>wouldn't be a common operation.
>>
>>The flexibility you would get _might_ be worth it.
>>
> 
> Well, I would personally vote for the processes _not_ relying on having
> the AGP aperture mapped directly, but instead, the various memory pages
> making their AGP aperture. Several chipsets (Apple ones for sure, but it
> seems others are hitting this too nowadays) don't support AGP aperture
> accesses from the CPU.

What are you actually saying, that pages mapped in agp can't be written by any 
means, or just that they can't be written through the agp address range?

It sounds kindof broken to me in any case.  How to mtrrs work in this world?

> That way, if you want several AGP contexts, you can have the processes
> tapping their AGP buffers without lock, locking would only be required
> once it's time to move one of these buffers in/out the physical GART
> under the arbitration of the DRM.

You don't need to lock to write to agp buffers in the current scheme.

You also don't need to play with the gart table just to draw a 2-triangle 
strip.  On some chipsets, particularly under smp, modifying the gart table is 
very slow.  Ask Jeff about this.

Keith





   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Benjamin Herrenschmidt

> - Interrupts
>
>   You don't use these right now, and as far as I can tell the main
>   reason for using them would be to just synchronize page flipping
>   with the framerate. No?

Which would be nice to have proper frame-sync on interlaced display
(especially with Michel Danzer work on using DRM for Xv blits).

> - IOIO and IOMEM access
>
>   iopl() gives access to IOIO

Which sucks on non-x86, but here XFree has it's own stuffs anyway

>   mmap() and AGP driver gives access to IOMEM/AGP

That one is problematic. I don't support the mmap interface properly
on Apple chipsets for example, because they don't support the AGP
aperture beeing accessed by the CPU. I play mapping tricks for the
in-kernel mapping of the aperture (using a home made agp_ioremap in
the DRM) and I use special vm_ops for drmMap of the AGP so that the
real mem pages get mapped in the client processes.
I could do the same with the AGP driver, though the main problem with
it currently is that clients using it via the ioctl interface tend to
first mmap the aperture, then bind/unbind memory to/from it.

I don't say that can't be fixed though ;)

I would much prefer the agpgart interface to be redisigned around
different semantics, mostly vmalloc() some space to use as AGP memory,
then bind that to the GART, but don't rely on direct AGP aperture
access.

There are also some slight speed improvements to win using this
sheme as I could map the AGP memory as cacheable (which would give a
significant boost on PPC) provided buffers & ring get properly flushed
before beeing "passed" to the chip.


Ben.




   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Benjamin Herrenschmidt

>HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok.
>That gives you the required system exclusion, and if you make it an
>explicit "get my GART context" function that is only called under the DRM
>lock _and_ only called when you actually need the AGP access, you also
>avoid the unnecessary context switches.
>
>You might still have some performance issues simply because you would do
>extra work when switching aperture mappings, but hopefully the GART switch
>wouldn't be a common operation.
>
>The flexibility you would get _might_ be worth it.

Well, I would personally vote for the processes _not_ relying on having
the AGP aperture mapped directly, but instead, the various memory pages
making their AGP aperture. Several chipsets (Apple ones for sure, but it
seems others are hitting this too nowadays) don't support AGP aperture
accesses from the CPU.

That way, if you want several AGP contexts, you can have the processes
tapping their AGP buffers without lock, locking would only be required
once it's time to move one of these buffers in/out the physical GART
under the arbitration of the DRM.

Ben.




   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Michel Dänzer

On Tue, 2002-06-18 at 10:57, Keith Whitwell wrote:
> 
> >  - IOIO and IOMEM access
> > 
> > iopl() gives access to IOIO
> > mmap() and AGP driver gives access to IOMEM/AGP
> > 
> > IOIO is actualy slightly slower in CPL3 than in CPL0, but it's
> > slower in CPU cycles, not in IO cycles. And since IO cycles
> > definitely dominate in IOIO (by orders of magnitude), this isn't
> > likely to be an issue.
> > 
> > And IOMEM is the same speed, since the only overhead for
> > user space is the TLB, and AGP mappings use the TLB even in kernel
> > space (vmalloc).
> 
> I'm not sure how this works.  Does the agp module have a facility to allow the 
> client to mmap the card mmio region & the framebuffer?I wasn't aware of this.

I don't know about that, but framebuffer devices certainly do. Not sure
if they will after the ongoing API changes in 2.5 though.

And failing that, one could fall back to /dev/mem, like DGA clients. Not
that I advocate that. :)


Just two cents of mine...


-- 
Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer
XFree86 and DRI project member   /  CS student, Free Software enthusiast


   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] R200 kernel interfaces

2002-06-18 Thread Keith Whitwell

Linus Torvalds wrote:
> Keith,
>  I've got a silly question for you..
> 
> Why do you need a kernel driver at all for the R200?

I go into your mail below, but the only good answer I have is:
1) To allow us to mmap the framebuffer, agp and mmio regions (or to handle 
mmio 
for us without us mapping it)
2) Backwards compatibility.  The ddx module is shared with the radeon & wants 
to talk to a kernel module.  This can be worked around.

> There are a few things that the kernel can do for you:
> 
>  - Locking.
> 
>   However, there are better (and faster) locks available in user
>   space, namely the "futex" interface. They take some getting used
>   to, but you can have some _truly_ low-cost locking using them.
> 
>   Example library can be found at:
>   http://www.kernel.org/pub/linux/kernel/people/rusty/futex-2.0.tar.gz

I'm not sure how these are so much better in concept than the concept behind 
our existing lock.  Both seem to have a  userspace fast path (with a locked 
cycle) and a syscall/ioctl slow path on contention.

The implementation of our lock has various workstation-leftovers like 
infrastructure for real virtualization of the hardware (kernel does context 
switching on lock contention), which aren't really used.

>  - Interrupts
> 
>   You don't use these right now, and as far as I can tell the main
>   reason for using them would be to just synchronize page flipping
>   with the framerate. No?

Correct.

>  - IOIO and IOMEM access
> 
>   iopl() gives access to IOIO
>   mmap() and AGP driver gives access to IOMEM/AGP
> 
>   IOIO is actualy slightly slower in CPL3 than in CPL0, but it's
>   slower in CPU cycles, not in IO cycles. And since IO cycles
>   definitely dominate in IOIO (by orders of magnitude), this isn't
>   likely to be an issue.
> 
>   And IOMEM is the same speed, since the only overhead for
>   user space is the TLB, and AGP mappings use the TLB even in kernel
>   space (vmalloc).

I'm not sure how this works.  Does the agp module have a facility to allow the 
client to mmap the card mmio region & the framebuffer?I wasn't aware of this.

>  - Global datastructures
> 
>   I think you do the aging right now globally or something.
> 
>   What else? Right now you cache some stuff globally (the ring tail
>   ptr etc), but that isn't necessary: you can re-create the
>   information on demand after a lock aquisition (since it is only
>   needed when contention happens).

Contention gives us a hint to check if the cliprects have changed.  There's a 
fairly ugly mechanism for retrieving the new cliprects (drop hw lock, get a 
spin-type lock, send a request, get a reply, drop the spin-lock, re-aquire the 
hw lock).  However - the check to see if this is necessary is cheap and the 
cliprects aren't required that often anyway.


> So from what I can tell, a trusted entity doesn't strictly _need_ any
> kernel support.
> 
> Yes, kernel support (or indirect rendering) is needed for untrusted
> applications, but it might actually be interesting to see what a
> direct-rendering all-user-land implementation looks like. It has some
> debugging advantages, and it may actually make sense to start from a
> totally trusted app that goes as fast as humanly possible, and then when
> that has been optimized to death look at just where the interfaces make
> the most sense..

This is closer & closer to the Utah direct rendering model (not that I'm 
complaining...)  In that model synchronization was achieved by having the X 
server be the only entity to touch the mmio region, but the client had direct 
access to a (large) dma buffer which it could ask the X server (via extended 
X11 protocol) to dispatch for it.  The X server would take care of cliprect 
issues.

This actually worked pretty well, but was limited to a single direct client 
(second & subsequent clients would go indirect, maybe sw-indirect, I forget). 
  A little bit of work could extend that fairly easily to multiple clients.

It also required that the direct client be run as root in order to mmap the 
framebuffer & dma region.

I think it's probably time to start considering a rewrite/redesign of the 3d 
infrastructure based around a minimalist approach.  There's just so much 
leftover code hanging around I have to ask what can be salvaged.

Keith



   Bringing you mounds of caffeinated joy
  >>> http://thinkgeek.com/sf<<<

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel