Re: [PATCH 0/3] Add support for Tegra Activity Monitor

2014-11-10 Thread Terje Bergström
On 11.11.2014 06:29, Alexandre Courbot wrote:
> I think (after a quick look at devfreq's source) that you can avoid
> polling altogether if you set polling_ms to 0 in your
> devfreq_dev_profile instance. Then it is up to you to call
> update_devfreq() from your custom governor whenever it sees fit.
> 
> ACTMON support seems to overlap between being a governor (which reacts
> to ACTMON interrupts and calls update_devfreq() when needed) and part of
> a devfreq_dev_profile (get_dev_status() needs to use the actmon
> counters). If we keep your current design where the driver simply
> controls a clock, you could have the ACTMON driver obtain that clock,
> register its governor, create a non-polling devfreq_dev_profile that
> controls that clock, and just call devfreq_add_device() with both. Then
> we will have the benefit of being able to use ACTMON as well as the
> performance and powersave governors on EMC, and switch policies
> dynamically.

Another way to use it is that governor is just a governor. It takes in
load values and generates new target frequencies, and doesn't know
anything about HW.

Device profile is the one that enables threshold interrupts and disables
polling. Device profile receives the interrupt, retrieves new load
number from hardware, and calls update_devfreq().

This way we keep all HW specific code in device profile, and there's
potential to use a generic governor instead of writing your own.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] drm/nouveau: disable caching for VRAM BOs on ARM

2014-05-25 Thread Terje Bergström
On 23.05.2014 17:40, Alex Courbot wrote:
> On 05/23/2014 06:59 PM, Lucas Stach wrote:
> So after checking with more knowledgeable people, it turns out this is 
> the expected behavior on ARM and BAR regions should be mapped uncached 
> on GK20A. All the more reasons to avoid using the BAR at all.

This is actually specific to Tegra.

>> You may want to make yourself aware of all the quirks required for
>> sharing memory between the GPU and CPU on an ARM host. I think there are
>> far more involved than what you see now and writing an replacement for
>> TTM will not be an easy task.
>>
>> Doing away with the concept of two memory areas will not get you to a
>> single unified address space. You would have to deal with things like
>> not being able to change the caching state of pages in the systems
>> lowmem yourself. You will still have to deal with remapping pages that
>> aren't currently visible to the CPU (ok this is not an issue on Jetson
>> right now as it only has 2GB of RAM), because it's in systems highmem,
>> or even in a different LPAE area.
>>
>> You really want to be sure you are aware of all the consequences of
>> this, before considering this task.
> 
> Yep, that's why I am seeking advice here. My first hope is that with a 
> few tweaks we will be able to keep using TTM and the current nouveau_bo 
> implementation. But unless I missed something this is not going to be easy.
> 
> We can also use something like the patch I originally sent to make it 
> work, although not with good performance, on GK20A. Not very graceful, 
> but it will allow applications to run.
> 
> In the long run though, we will want to achieve better performance, and 
> it seems like a BO implementation targeted at UMA devices would also be 
> beneficial to quite a few desktop GPUs. So as tricky as it may be I'm 
> interested in gathering thoughts and why not giving it a first try with 
> GK20A, even if it imposes some limitations like having buffers in lowmem 
> in a first time (we can probably live with this one for a short while, 
> and 64 bits will also be coming to the rescue :))

I don't think lowmem or LPAE is any problem, if the memory manager is
designed with that in mind. Vast majority of the buffers kernel
allocates do not need to be touched in kernel space.

Actually I can't think of any buffers that we allocate on behalf of user
space that would need to be permanently mapped also to kernel. In case
or relocs only push buffer needs to be temporarily mapped to kernel.

Ultimately even relocs are not necessary if we expose GPU virtual
addresses directly to user space. But that's another topic.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 04/10] drm/nouveau/fb: add GK20A support

2014-05-01 Thread Terje Bergström
On 01.05.2014 07:49, Alexandre Courbot wrote:
> Agreed. Besides I hope the day won't come where we have to go through
> 2 layers of memory translation for the GPU...

That's actually the method of operation that gives us the best
performance. GPU likes big pages, and without IOMMU translation you'd
have a hard time finding enough contiguous physical memory.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpu: host1x: do not check previously handled gathers

2014-01-30 Thread Terje Bergström
On 07.01.2014 22:03, Erik Faye-Lund wrote:
> When patching gathers, we don't need to check against
> gathers with lower indices than the current one, as
> they are guaranteed to already have been handled.
> 
> Signed-off-by: Erik Faye-Lund 
> ---
> 
> Here's a trivial optimization I have been running with for a while.
> 
>  drivers/gpu/host1x/job.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index de5ec33..e965805 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -534,7 +534,7 @@ int host1x_job_pin(struct host1x_job *job, struct device 
> *dev)
>  
>   g->base = job->gather_addr_phys[i];
>  
> - for (j = 0; j < job->num_gathers; j++)
> + for (j = i + 1; j < job->num_gathers; j++)
>   if (job->gathers[j].bo == g->bo)
>   job->gathers[j].handled = true;
>  

Hi,

Thanks. This looks good logically, and I ran some tests that agree.

Acked-By: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why have another variable deciding a tracepoint?

2013-11-17 Thread Terje Bergström
On 15.11.2013 16:52, Steven Rostedt wrote:
> So let me get this straight. The recording of the cdma_push() data is
> slow, correct? What mapping are you talking about?
> 
>   trace_host1x_cdma_push(dev_name(cdma_to_channel(cdma)->dev),
>  op1, op2);
> 
> #define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma)
> 
> That just references the cdma field of struct host1x_channel, to get
> the dev field, which does dev_name() that just gets the dev->init_name
> (if defined, or kobject_name() if not). And the two u32 fields that are
> passed in.
> 
> The tracepoint assigns with:
> 
>   TP_fast_assign(
>   __entry->name = name;
>   __entry->op1 = op1;
>   __entry->op2 = op2;
>   ),
> 
> So I still have to ask; what do you have to set up and take down here?

Oops, there's a piece missing that I didn't mention. What we want is a
full trace of what we're sending to the push buffer. See
hw/channel_hw.c/trace_write_gather(). It maps the buffer handed from
user space, dumps it to ftrace and unmaps it. That costs a lot of time
that we don't want to spend in normal operation and hence the special
condition.

trace_host1x_cdma_push() in the code you refer to just makes the dump
complete by adding some overhead opcodes that kernel needs to add. It
doesn't make sense to output that if we don't get the bulk from
trace_write_gather() and that's why it's also checking the same
conditional. It'd just make the trace look corrupted.

> It's documented in the code ;-)  (kidding)
> 
> Yeah, I need to get that out a bit more. That's actually how I found
> this code, I'm doing searches for all the locations that can benefit
> from TRACE_EVENT_CONDITION(), and I'm trying to fix them up. I'm still
> not convinced that this extra variable checking is required for this
> tracepoint.
> 
> As you state though, I'll have to get time to document CONDITION() and
> how and when to use it.

Sounds good. :-)

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why have another variable deciding a tracepoint?

2013-11-15 Thread Terje Bergström
On 15.11.2013 06:48, Steven Rostedt wrote:
> I've been reviewing different users of tracepoints and I stumbled
> across this:
> 
> drivers/gpu/host1x/cdma.c: host1x_cdma_push()
> 
>   if (host1x_debug_trace_cmdbuf)
>   trace_host1x_cdma_push(dev_name(cdma_to_channel(cdma)->dev),
>  op1, op2);
> 
> That host1x_debug_trace_cmdbuf is a variable that gets set by another
> debugfs file "trace_cmdbuf" that is custom to this driver.
> 
> Why?

This is because it takes a lot of time to prepare for dumping the cmdbuf
data, like mapping buffer and unmapping after tracing. We want to avoid
all that preparation time.

> The tracepoint host1x_cdma_push is already controlled by either ftrace
> or perf. If it gets enabled by perf or ftrace, it still wont be traced
> unless we also enable this trace_cmdbuf. Is there some reason for this?
> I can't figure it out from the change log: 6236451d83a720 ("gpu:
> host1x: Add debug support").
> 
> As tracepoints uses jump labels, there is no branch cost associated
> with them. That is, they are either a direct jump, or a nop (in most
> cases a nop). But here you added the overhead of a conditional branch
> depending on this variable.
> 
> If this is truly needed, then use TRACE_EVENT_CONDITION() for that
> tracepoint.

TRACE_EVENT_CONDITION() isn't documented, so I don't know how that would
help.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/4] gpu: host1x: Add syncpoint base support

2013-10-28 Thread Terje Bergström
On 14.10.2013 15:21, Arto Merilainen wrote:
> The host1x driver uses currently syncpoints statically from host1x point of
> view. If we do a wait inside a job, it always has a constant value to wait.
> host1x supports also doing relative syncpoint waits with respect to syncpoint
> bases. This allows doing multiple operations inside a single submit and
> waiting an operation to complete before moving to next one.
> 
> This set of patches adds support for syncpoint bases to host1x driver and
> enables the support for gr2d client.
> 
> I have tested the series using the host1x test application (available at [0],
> function test_wait_base() in tests/tegra/host1x/tegra_host1x_test.c) on 
> cardhu.
> I would appreciate help in reviewing the series and testing the patches
> on other boards.
> 
> Changes in v2:
> - Reordered various code blocks to improve code consistency
> - Functions host1x_syncpt_alloc() and host1x_syncpt_request() take now a 
> single
> bitfield argument instead of separate boolean arguments
> - Added a separate ioctl call for querying the base associated with some
> syncpoint
> 
> [0] https://gitorious.org/linux-host1x/libdrm-host1x
> 
> Arto Merilainen (4):
>   gpu: host1x: Add 'flags' field to syncpt request
>   gpu: host1x: Add syncpoint base support
>   drm/tegra: Deliver syncpoint base to user space
>   drm/tegra: Reserve base for gr2d
> 
>  drivers/gpu/host1x/dev.h   |  2 ++
>  drivers/gpu/host1x/drm/drm.c   | 25 +
>  drivers/gpu/host1x/drm/gr2d.c  |  2 +-
>  drivers/gpu/host1x/hw/channel_hw.c | 19 ++
>  drivers/gpu/host1x/hw/hw_host1x01_uclass.h |  6 
>  drivers/gpu/host1x/syncpt.c| 58 
> +-
>  drivers/gpu/host1x/syncpt.h| 10 +-
>  include/uapi/drm/tegra_drm.h   | 26 +-
>  8 files changed, 128 insertions(+), 20 deletions(-)
> 

The series,

Reviewed-by: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/5] gpu: host1x: Add runtime pm support

2013-10-08 Thread Terje Bergström
On 08.10.2013 09:27, Arto Merilainen wrote:
> This series adds runtime pm support for host1x, gr2d and dc. It retains the
> current behaviour if CONFIG_PM_RUNTIME is not enabled.
> 
> The gr2d clock is enabled when a new job is submitted and disabled when
> the work is done. Due to parent->child relations between host1x->gr2d, this
> scheme enables and disables host1x clock.
> 
> For dc, the clocks are enabled in .probe and disabled in .remove via runtime
> pm instead of direct clock APIs.
> 
> Mayuresh is unfortunately not available to continue with the series and hence
> I will continue working on the patches.

The series,

Acked-By: Terje Bergstrom 

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpu: host1x: check relocs after all gathers are consumed

2013-10-08 Thread Terje Bergström
On 04.10.2013 23:18, Erik Faye-Lund wrote:
> The num_relocs count are passed to the kernel per job, not per gather.
> 
> For multi-gather jobs, we would previously fail if there were relocs in
> other gathers aside from the first one.
> 
> Fix this by simply moving the check until all gathers have been
> consumed.
> 
> Signed-off-by: Erik Faye-Lund 
> ---
>  drivers/gpu/host1x/job.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index c4e1050..c9ddff8 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -436,10 +436,6 @@ static int validate(struct host1x_firewall *fw, struct 
> host1x_job_gather *g)
>   }
>   }
>  
> - /* No relocs should remain at this point */
> - if (fw->num_relocs)
> - err = -EINVAL;
> -
>  out:
>   return err;
>  }
> @@ -493,6 +489,10 @@ static inline int copy_gathers(struct host1x_job *job, 
> struct device *dev)
>   offset += g->words * sizeof(u32);
>   }
>  
> + /* No relocs should remain at this point */
> + if (fw.num_relocs)
> + return -EINVAL;
> +
>   return 0;
>  }

Good catch.

Acked-By: Terje Bergstrom 

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] ARM: tegra: Add host1x, dc and hdmi to Tegra114 device tree

2013-08-28 Thread Terje Bergström
On 28.08.2013 16:18, Thierry Reding wrote:
> I think that's not all. I have local patches that also introduce a v2 of
> host1x, because the number of syncpoints is different. There may also be
> other differences, but Terje might be more qualified to answer that.

Tegra4 host1x has an extra channel(totals 9), which caused bitfields in
a couple of registers to shift. The registers are mainly used in the
debug code to dump the channel FIFO. Same number of sync points as
Tegra3, but 12 wait bases.

Other changes are minor and driver already deals with them, for example
32-bit versus 16-bit sync point value comparison.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/27] drivers/gpu/host1x/drm: don't check resource with devm_ioremap_resource

2013-08-01 Thread Terje Bergström
On 23.07.2013 21:01, Wolfram Sang wrote:
> diff --git a/drivers/gpu/host1x/drm/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c
> index 01097da..9ffece6 100644
> --- a/drivers/gpu/host1x/drm/hdmi.c
> +++ b/drivers/gpu/host1x/drm/hdmi.c
> @@ -1248,9 +1248,6 @@ static int tegra_hdmi_probe(struct platform_device 
> *pdev)
>   return err;
>  
>   regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> - if (!regs)
> - return -ENXIO;
> -
>   hdmi->regs = devm_ioremap_resource(&pdev->dev, regs);
>   if (IS_ERR(hdmi->regs))
>   return PTR_ERR(hdmi->regs);
> 

Looks good to me.

Reviewed-By: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/tegra: add support for runtime pm

2013-06-10 Thread Terje Bergström
On 10.06.2013 23:43, Thierry Reding wrote:
> Can you post the corresponding wrappers to make it easier to discuss
> them? If they just wrap runtime PM calls then they don't solve the
> locality problems that Terje brought up.

I don't think the wrappers are applicable here. They're there in
downstream to fix a problem with runtime PM core: some host1x clients
require host1x to be active when they're processing, and some (dc) need
it only when CPU is programming DC or when a signal towards host1x is
pending. Runtime PM wants to keep parent enabled when any of the clients
are enabled.

We solved this downstream by enabling ignore_children for host1x. As a
side effect, we need to explicitly enable host1x when we're enabling one
of its clients that need host1x, and that's what the wrapper does.

We can fix this particular problem by moving calls to runtime PM to job.c.

Mayuresh, once you've managed to test your patches, please send the v2.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/7] Miscellaneous fixes to host1x

2013-05-31 Thread Terje Bergström
On 29.05.2013 13:26, Arto Merilainen wrote:
> This patch series fixes two issues in the host1x driver: First, the
> command buffer validation routine had vulnerabilities that were not
> detected in earlier testing. Second, the return codes of some
> functions were misleading or completely missing. This caused the
> driver to give wrong return codes also to the userspace.
> 
> The series is based on top of 3.10rc3. I have tested the patch series
> on cardhu by running host1x and gr2d test cases (available at [0]).
> I would appreciate any help in testing/reviewing these patches.
> 
> Changes from v1:
>  * Rebased on top of 3.10rc3
>  * Split firewall fixes to smaller patches
>  * Reworked no-reloc case in firewall code. Fix in v1 was not
>sufficient in all cases
>  * Dropped patch "Fix syncpoint wait return value" as it is not
>critical and discussion on it has not yet settled.
>  * Fixed style and whitespace issues
> 
> [0] https://gitorious.org/linux-host1x/libdrm-host1x
> 
> Arto Merilainen (5):
>   gpu: host1x: Check reloc table before usage
>   gpu: host1x: Copy gathers before verification
>   gpu: host1x: Fix memory access in syncpt request
>   gpu: host1x: Fix client_managed type
>   gpu: host1x: Rework CPU syncpoint increment
> 
> Terje Bergstrom (2):
>   gpu: host1x: Check INCR opcode correctly
>   gpu: host1x: Don't reset firewall between gathers
> 
>  drivers/gpu/host1x/dev.h  |   8 +--
>  drivers/gpu/host1x/drm/drm.c  |   3 +-
>  drivers/gpu/host1x/drm/gr2d.c |   2 +-
>  drivers/gpu/host1x/hw/cdma_hw.c   |   2 +-
>  drivers/gpu/host1x/hw/syncpt_hw.c |  12 ++--
>  drivers/gpu/host1x/job.c  | 135 
> +-
>  drivers/gpu/host1x/syncpt.c   |  26 +++-
>  drivers/gpu/host1x/syncpt.h   |  13 ++--
>  8 files changed, 85 insertions(+), 116 deletions(-)
> 

Arto's patches above,

Acked-By: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/tegra: add support for runtime pm

2013-05-27 Thread Terje Bergström
On 27.05.2013 18:45, Thierry Reding wrote:
> On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
>> +#ifdef CONFIG_PM_RUNTIME
>> +static int host1x_runtime_suspend(struct device *dev)
>> +{
>> +struct host1x *host;
>> +
>> +host = dev_get_drvdata(dev);
>> +if (IS_ERR_OR_NULL(host))
> 
> I think a simple
> 
>   if (!host)
>   return -EINVAL;
> 
> would be enough here. The driver-data of the device should never be an
> ERR_PTR()-encoded value, but either a valid pointer to a host1x object
> or NULL.

True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
the called API returns a NULL on error or an error code. In case of
error code we should just propagate that.

> Same comments apply here. Also I think it might be a good idea to split
> the host1x and gr2d changes into separate patches.

That's a bit tricky, but doable. We just need to enable it for 2D first,
and then host1x to keep bisectability.

>>  static void action_submit_complete(struct host1x_waitlist *waiter)
>>  {
>> +int completed = waiter->count;
>>  struct host1x_channel *channel = waiter->data;
>>  
>> +/* disable clocks for all the submits that got completed in this lot */
>> +while (completed--)
>> +pm_runtime_put(channel->dev);
>> +
>>  host1x_cdma_update(&channel->cdma);
>>  
>> -/*  Add nr_completed to trace */
>> +/* Add nr_completed to trace */
>>  trace_host1x_channel_submit_complete(dev_name(channel->dev),
>>   waiter->count, waiter->thresh);
>> -
>>  }
> 
> This feels hackish. But I can't see any better place to do this. Terje,
> Arto: any ideas how we can do this in a cleaner way? If there's nothing
> better then maybe moving the code into a separate function, say
> host1x_waitlist_complete(), might make this less awkward?

Yeah, it's a bit awkward. action_submit_complete() actually does handle
completion of multiple jobs, and we do one pm_runtime_get() per job.

We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
through each job that is completed, so while freeing the job it could as
well call runtime PM. That way we could even remove the waiter->count
variable altogether as it's not needed anymore.

The not-so-beautiful aspect is that we do pm_runtime_get() in
host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
readability it's be great to have them in the same file. I actually get
questions every now and then because in downstream because of doing
these operations in different files.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv9 0/9] Support Tegra 2D hardware

2013-04-03 Thread Terje Bergström
On 22.03.2013 16:54, Thierry Reding wrote:
> On Fri, Mar 22, 2013 at 04:34:00PM +0200, Terje Bergstrom wrote:
>> This set of patches adds support for Tegra20 and Tegra30 host1x and
>> 2D. It is based on linux-next-20130322 with RTC fixes applied.
(...)
> For the series:
> 
> Reviewed-by: Thierry Reding 
> Tested-by: Thierry Reding 
> 
> For the Tegra DRM bits:
> 
> Acked-by: Thierry Reding 

Dave,

What's the status of this patch? The patch set is not yet merged in your
tree. Did I forget to do some step?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 10/10] drm: tegra: Add gr2d device

2013-03-19 Thread Terje Bergström
On 15.03.2013 14:13, Thierry Reding wrote:
> Also you now create a lookup table (or bitfield actually) as we
> discussed but you still pass around a function to check the lookup table
> against. What I originally intended was to not pass a function around at
> all but directly use the lookup table/bitfield. However I think we can
> leave this as-is for now and change it later if required.

Yeah, I think it's better if we leave this as is now. We should actually
have one table for host1x and one for 2D. The host1x one should be
shared between the drivers, but the table for client unit should be
local to the driver.

Let's take this up again when we have another driver.

> 
>> +static int gr2d_probe(struct platform_device *pdev)
>> +{
>> + struct device *dev = &pdev->dev;
>> + struct host1x_drm *host1x = host1x_get_drm_data(dev->parent);
>> + int err;
>> + struct gr2d *gr2d = NULL;
>> + struct host1x_syncpt **syncpts;
> 
> I don't think there's a need for this temporary variable. You could just
> use gr2d->client.syncpts directly.

I actually ended up with pretty long lines when I tried this, so I hope
it's ok to leave as is.

> 
>> + gr2d = devm_kzalloc(dev, sizeof(*gr2d), GFP_KERNEL);
>> + if (!gr2d)
>> + return -ENOMEM;
>> +
>> + syncpts = devm_kzalloc(dev, sizeof(*syncpts), GFP_KERNEL);

F.ex. this line would split two two lines.

Otherwise the changes should be now pretty much ok. You sent a bunch of
changes (thanks) that I merged to my tree. I'll just need to do some
testing and re-split the patches and send v8.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 10/10] drm: tegra: Add gr2d device

2013-03-15 Thread Terje Bergström
On 15.03.2013 14:13, Thierry Reding wrote:
> On Wed, Mar 13, 2013 at 02:36:26PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
> [...]
>> +static inline struct host1x_drm_context *tegra_drm_get_context(
>> + struct list_head *contexts,
>> + struct host1x_drm_context *_ctx)
>> +{
>> + struct host1x_drm_context *ctx;
>> +
>> + list_for_each_entry(ctx, contexts, list)
>> + if (ctx == _ctx)
>> + return ctx;
>> + return NULL;
>> +}
> 
> Maybe make this a little more high-level, such as:
> 
> static bool host1x_drm_file_owns_context(struct host1x_drm_file *file,
>  struct host1x_drm_context *context)
> 
> ?

We only need the true/false value, so changing signature makes sense.

> 
>> +static int tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
>> + struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_open_channel_args *args = data;
>> + struct host1x_client *client;
>> + struct host1x_drm_context *context;
>> + struct host1x_drm_file *fpriv = file_priv->driver_priv;
>> + struct host1x_drm *host1x = drm->dev_private;
>> + int err = -ENODEV;
>> +
>> + context = kzalloc(sizeof(*context), GFP_KERNEL);
>> + if (!context)
>> + return -ENOMEM;
>> +
>> + list_for_each_entry(client, &host1x->clients, list)
>> + if (client->class == args->class) {
>> + context->client = client;
> 
> Why do you assign this here? Should it perhaps be assigned only when the
> opening of the channel succeeds? The .open_channel() already receives a
> pointer to the client as parameter, so it doesn't have to be associated
> with the context.

True, we can move the assignment to happen after checking the
open_channel result.

> 
>> +static int tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_get_syncpoint *args = data;
>> + struct host1x_drm_file *fpriv = file_priv->driver_priv;
>> + struct host1x_drm_context *context =
>> + (struct host1x_drm_context *)(uintptr_t)args->context;
>> +
>> + if (!tegra_drm_get_context(&fpriv->contexts, context))
>> + return -ENODEV;
>> +
>> + if (context->client->num_syncpts < args->param)
>> + return -ENODEV;
> 
> I think this might be easier to read as:
> 
> if (args->param >= context->client->num_syncpts)
> return -ENODEV;

Ok, will do.

> 
>> + args->value = host1x_syncpt_id(context->client->syncpts[args->param]);
> 
> This could use a temporary variable "syncpt" to make it easier to read.

Ok.

> 
>> +static int tegra_drm_gem_create_ioctl(struct drm_device *drm, void *data,
>> +   struct drm_file *file_priv)
> 
> tegra_drm_ioctl_gem_create()?

Sure.

> 
>> +{
>> + struct tegra_drm_create *args = data;
>> + struct drm_gem_cma_object *cma_obj;
>> + struct tegra_drm_bo *cma_bo;
> 
> These can probably just be named "cma"/"gem" and "bo" instead.

Ok.

> 
>> +static int tegra_drm_ioctl_get_offset(struct drm_device *drm, void *data,
>> +   struct drm_file *file_priv)
> 
> I think this might be more generically named tegra_drm_ioctl_mmap()
> which might come in handy if we ever need to do something more than just
> retrieve the offset.

Yeah, that changes the API semantics a bit, but in general there
shouldn't be a need to just get the offset without doing the actual mmap.

> 
>> +{
>> + struct tegra_drm_get_offset *args = data;
>> + struct drm_gem_cma_object *cma_obj;
>> + struct drm_gem_object *obj;
>> +
>> + obj = drm_gem_object_lookup(drm, file_priv, args->handle);
>> + if (!obj)
>> + return -EINVAL;
>> + cma_obj = to_drm_gem_cma_obj(obj);
> 
> The above two lines should be separated by a blank line.

Ok.

> 
>> +
>> + args->offset = cma_obj->base.map_list.hash.key << PAGE_SHIFT;
> 
> Perhaps a better way would be to export the get_gem_mmap_offset() from
> drivers/gpu/drm/drm_gem_cma_helper.c and reuse that.

Ok, we'll add that as a patch to the series.

> 
>>  static struct drm_ioctl_desc tegra_drm_ioctls[] = {
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_GEM_CREATE,
>> + tegra_drm_gem_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ,
>> + tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR,
>> + tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT,
>> + tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL,
>> +   

Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-11 Thread Terje Bergström
On 11.03.2013 09:18, Thierry Reding wrote:
> This sound a bit over-engineered at this point in time. DRM is currently
> the only user. Is anybody working on any non-DRM drivers that would use
> this?

Well, this contains beginning of that:

http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2

I don't want to give these guys any excuse not to port it over to host1x
code base. :-)

> Even that aside, I don't think host1x_mem_handle is a good choice of
> name here. The objects are much more than handles. They are in fact
> buffer objects, which can optionally be attached to a handle. I also
> think that using a void * to store the handle specific data isn't such a
> good idea.

Naming if not an issue for me - we can easily agree on using _bo.

> So how about the following proposal, which I think might satisfy both of
> us:
> 
>   struct host1x_bo;
> 
>   struct host1x_bo_ops {
>   struct host1x_bo *(*get)(struct host1x_bo *bo);
>   void (*put)(struct host1x_bo *bo);
>   dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
>   ...
>   };
> 
>   struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
>   void host1x_bo_put(struct host1x_bo *bo);
>   dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
>   ...
> 
>   struct host1x_bo {
>   const struct host1x_bo_ops *ops;
>   ...
>   };
> 
>   struct tegra_drm_bo {
>   struct host1x_bo base;
>   ...
>   };
> 
> That way you can get rid of the host1x_memmgr_create_handle() helper and
> instead embed host1x_bo into driver-/framework-specific structures with
> the necessary initialization.

This would make sense. We'll get back when we have enough of
implementation done to understand it all. One consequence is that we
cannot use drm_gem_cma_create() anymore. We'll have to introduce a
function that does the same as drm_gem_cma_create(), but it takes a
pre-allocated drm_gem_cma_object pointer. That way we can allocate the
struct, and use DRM CMA just to initialize the drm_gem_cma_object.

Other way would be just taking a copy of DRM CMA helper, but I'd like to
defer that to the next step when we implement IOMMU aware allocator.

> It also allows you to interact directly with the objects instead of
> having to go through the memmgr API. The memory manager doesn't really
> exist anymore so keeping the name in the API is only confusing. Your
> current proposal deals with memory handles directly already so it's
> really just making the naming more consistent.

The memmgr APIs are currently just a shortcut wrapper to the ops, so in
that sense the memmgr does not really exist. I think it might still make
sense to keep static inline wrappers for calling the ops within, but we
could rename them to host1x_bo_somethingandother. Then it'd follow the
pattern we are using for the hw ops in the latest set.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-10 Thread Terje Bergström
On 08.03.2013 22:43, Thierry Reding wrote:
> A bo is just a buffer object, so I don't see why the name shouldn't
> be used. The name is in no way specific to DRM or GEM. But the point
> that I was trying to make was that there is nothing to suggest that
> we couldn't use drm_gem_object as the underlying scaffold to base all
> host1x buffer objects on.
> 
> Furthermore I don't understand why you've chosen this approach. It
> is completely different from what other drivers do and therefore
> makes it more difficult to comprehend. That alone I could live with
> if there were any advantages to that approach, but as far as I can
> tell there are none.

I was following the plan we agreed on earlier in email discussion with
you and Lucas:

On 29.11.2012 11:09, Lucas Stach wrote:
> We should aim for a clean split here. GEM handles are something which
> is really specific to how DRM works and as such should be constructed
> by tegradrm. nvhost should really just manage allocations/virtual
> address space and provide something that is able to back all the GEM
> handle operations.
> 
> nvhost has really no reason at all to even know about GEM handles.
> If you back a GEM object by a nvhost object you can just peel out
> the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
> handler and queue the job to nvhost using it's native handles.
> 
> This way you would also be able to construct different handles (like
> GEM obj or V4L2 buffers) from the same backing nvhost object. Note
> that I'm not sure how useful this would be, but it seems like a
> reasonable design to me being able to do so.

With this structure, we are already prepared for non-DRM APIs. Tt's a
matter of familiarity of code versus future expansion. Code paths for
both are as simple/complex, so neither has a direct technical
superiority in performance.

I know other DRM drivers have opted to hard code GEM dependency
throughout the code. Then again, host1x hardware is managing much more
than graphics, so we need to think outside the DRM box, too.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-08 Thread Terje Bergström
On 26.02.2013 11:48, Terje Bergström wrote:
> On 25.02.2013 17:24, Thierry Reding wrote:
>> You use two different styles to indent the function parameters. You
>> might want to stick to one, preferably aligning them with the first
>> parameter on the first line.
> 
> I've generally favored "two tabs" indenting, but we'll anyway
> standardize on one.

We standardized on the convention used in tegradrm, i.e. aligning with
first parameter.

>> There's nothing in this function that requires a platform_device, so
>> passing struct device should be enough. Or maybe host1x_cdma should get
>> a struct device * field?
> 
> I think we'll just start using struct device * in general in code.
> Arto's been already fixing a lot of these, so he might've already fixed
> this.

We did a sweep in the code and now I hope everything that can, uses
struct device *. The side effect was getting rid of a lot of casting,
which is good.

>> Why don't you use any of the kernel's reference counting mechanisms?
>>
>>> +void host1x_channel_put(struct host1x_channel *ch)
>>> +{
>>> + mutex_lock(&ch->reflock);
>>> + if (ch->refcount == 1) {
>>> + host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
>>> + host1x_cdma_deinit(&ch->cdma);
>>> + }
>>> + ch->refcount--;
>>> + mutex_unlock(&ch->reflock);
>>> +}
>>
>> I think you can do all of this using a kref.
> 
> I think the original reason was that there's no reason to use atomic
> kref, as we anyway have to do mutual exclusion via mutex. But, using
> kref won't be any problem, so we could use that.

Actually, we ended up with a problem with this. kref assumes that once
refcount goes to zero, it gets destroyed. In ch->refcount, going to zero
is just fine and just indicates that we need to initialize. And, we
anyway need to do locking, so we didn't do the conversion to kref.

>>> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
>>> +{
(...)
>>> +}
>>
>> I think the critical section could be shorter here. It's probably not
>> worth the extra trouble, though, given that channels are not often
>> allocated.
> 
> Yeah, boot time isn't measured in microseconds. :-) But, if we just make
> allocated_channels an atomic, we should be able to drop chlist_mutex
> altogether and it could simplify the code.

There wasn't much we could have moved outside the critical section, so
we didn't touch this area.

>> Also, is it really necessary to abstract these into an ops structure? I
>> get that newer hardware revisions might require different ops for sync-
>> point handling because the register layout or number of syncpoints may
>> be different, but the CDMA and push buffer (below) concepts are pretty
>> much a software abstraction, and as such its implementation is unlikely
>> to change with some future hardware revision.
> 
> Pushbuffer ops can become generic. There's only one catch - init uses
> the restart opcode. But the opcode is not going to change, so we can
> generalize that.

We ended up keeping the init as an operation, but rest of push buffer
ops became generic.

>>
>>> +/*
>>> + * Push two words to the push buffer
>>> + * Caller must ensure push buffer is not full
>>> + */
>>> +static void push_buffer_push_to(struct push_buffer *pb,
>>> + struct mem_handle *handle,
>>> + u32 op1, u32 op2)
>>> +{
>>> + u32 cur = pb->cur;
>>> + u32 *p = (u32 *)((u32)pb->mapped + cur);
>>
>> You do all this extra casting to make sure to increment by bytes and not
>> 32-bit words. How about you change pb->cur to contain the word index, so
>> that you don't have to go through hoops each time around.

When we changed DMASTART and DMAEND to actually denote the push buffer
area, we noticed that DMAGET and DMAPUT are actually relative to
DMASTART and DMAEND. This and the need to access both CPU and device
virtual addresses coupled with changing to word indexes didn't actually
simplify the code, so we kept still using byte indexes.

>>
>>> +/*
>>> + * Return the number of two word slots free in the push buffer
>>> + */
>>> +static u32 push_buffer_space(struct push_buffer *pb)
>>> +{
>>> + return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
>>> +}
>>
>> Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
>> PUSH_BUFFER_SIZE, can it?
&

Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-11 Thread Terje Bergström
On 10.02.2013 22:44, Thierry Reding wrote:
> On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
>> You're right about performance. We already saw quite a bad performance
>> hit with the current firewall, so we'll need to worry about performance
>> later.
> 
> I guess the additional overhead of looking up in a table vs. an actual
> function being run will be rather small compared to the total overhead
> incurred by having the firewall in the first place.

Yeah, I'll just implement a simple linear table lookup and let's see
what happens. I'll optimize with bitfield if needed.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-10 Thread Terje Bergström
On 07.02.2013 23:07, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
>>>> That's the security firewall. It walks through each submit, and ensures
>>>> that each register write that writes an address, goes through the host1x
>>>> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
>>>> memory locations.
>>> I see. Can this be made more generic? Perhaps adding a table of valid
>>> registers to the device and use a generic function to iterate over that
>>> instead of having to provide the same function for each client.
>> For which one does gcc generate more efficient code? I've thought a
>> switch-case statement might get compiled into something more efficient
>> than a table lookup.
>> But the rest of the code is generic - just the one function which
>> compares against known address registers is specific to 2D.
> Table lookup should be pretty fast. I wouldn't worry too much about
> performance at this stage, though. Readability is more important in my
> opinion. A lookup table is a lot more readable and reusable I think. If
> it turns out that using a function is actually faster we can always
> optimize later.

You're right about performance. We already saw quite a bad performance
hit with the current firewall, so we'll need to worry about performance
later.

I'll take a look at converting the register list to a table. Instead of
always doing a linear search of a table, a bitfield might be more
appropriate.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:54, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
>> Yeah, it's actually working around the host1x duplicate naming.
>> host1x_syncpt_get takes struct host1x as parameter, but that's different
>> host1x than in this code.
> 
> So maybe a better way would be to rename the DRM host1x after all. If it
> avoids the need for workarounds such as this I think it justifies the
> additional churn.

Ok, I'll include that. Do you have a preference for the name? Something
like "host1x_drm" might work?

>>> Also, how useful is it to create a context? Looking at the gr2d
>>> implementation for .open_channel(), it will return the same channel to
>>> whichever userspace process requests them. Can you explain why it is
>>> necessary at all? From the name I would have expected some kind of
>>> context switching to take place when different applications submit
>>> requests to the same client, but that doesn't seem to be the case.
>>
>> Hardware context switching will be a later submit, and it'll actually
>> create a new structure. Hardware context might live longer than the
>> process that created it, so they'll need to be separate.
> 
> Why would it live longer than the process? Isn't the whole purpose of
> the context to keep per-process state? What use is that state if the
> process dies?

Hardware context has to be kept alive for as long as there's a job
running from that process. If an app sends 10 jobs to 2D channel, and
dies immediately, there's no sane way for host1x to remove the jobs from
queue. The jobs will keep on running and kernel will need to track them.

>> Perhaps the justification is that this way we can keep the kernel API
>> stable even when we add support for hardware contexts and other clients.
> 
> We don't need a stable kernel API. But I guess it is fine to keep it if
> for no other reason to fill the context returned in the ioctl() with
> meaningful data.

Sorry, I meant stable IOCTL API, so we agree on this.

>> host1x_drm_file sounds a bit odd, because it's not really a file, but a
>> private data pointer stored in driver_priv.
> 
> The same is true for struct drm_file, which is stored in struct file's
> .private_data field. I find it to be very intuitive if the inheritance
> is reflected in the structure name. struct host1x_drm_file is host1x'
> driver-specific part of struct drm_file.

Ok, makes sense. I'll do that.

> I fail to see how dma_buf would require a separate mem_mgr_type. Can we
> perhaps postpone this to a later point and just go with CMA as the only
> alternative for now until we have an actual working implementation that
> we can use this for?

Each submit refers to a number of buffers. Some of them are the streams,
some are textures or other input/output buffers. Each of these buffers
might be passed as a GEM handle, or (when implemented) as a dma_buf fd.
Thus we need a field to tell host1x which API to call to handle that handle.

I think we can leave out the code for managing the type until we
actually have separate memory managers. That'd make GEM handles
effectively of type 0, as we don't set it.

> 
>>>> +static int gr2d_submit(struct host1x_drm_context *context,
>>>> + struct tegra_drm_submit_args *args,
>>>> + struct drm_device *drm,
>>>> + struct drm_file *file_priv)
>>>> +{
>>>> + struct host1x_job *job;
>>>> + int num_cmdbufs = args->num_cmdbufs;
>>>> + int num_relocs = args->num_relocs;
>>>> + int num_waitchks = args->num_waitchks;
>>>> + struct tegra_drm_cmdbuf __user *cmdbufs =
>>>> + (void * __user)(uintptr_t)args->cmdbufs;
>>>> + struct tegra_drm_reloc __user *relocs =
>>>> + (void * __user)(uintptr_t)args->relocs;
>>>> + struct tegra_drm_waitchk __user *waitchks =
>>>> + (void * __user)(uintptr_t)args->waitchks;
>>>
>>> No need for all the uintptr_t casts.
>>
>> Will try to remove - but I do remember getting compiler warnings without
>> them.
> 
> I think you shouldn't even have to cast to void * first. Just cast to
> the target type directly. I don't see why the compiler should complain.

This is what I get without them:

drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:112

Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:15, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
>> This is used by debugfs code to direct to debugfs, and
>> nvhost_debug_dump() to send via printk.
> 
> Yes, that was precisely my point. Why bother providing the same data via
> several output methods. debugfs is good for showing large amounts of
> data such as register dumps or a tabular representation of syncpoints
> for instance.
> 
> If, however, you want to interactively show debug information using
> printk the same format isn't very useful and something more reduced is
> often better.

debugfs is there to be able to get a reliable dump of host1x state
(f.ex. no lines intermixed with other output).

printk output is there because often we get just UART logs from failure
cases, and having as much information as possible in the logs speeds up
debugging.

Both of them need to output the values of sync points, and the channel
state. Dumping all of that consists of a lot of code, and I wouldn't
want to duplicate that for two output formats.

>> I have another suggestion. In downstream I removed the decoding part and
>> I just print out a string of hex. That removes quite a bit bunch of code
>> from kernel. It makes the debug output also more compact.
> I don't know. I think if you use in-kernel debugging facilities such as
> debugfs or printk, then the output should be self-explanatory. However I
> do see the usefulness of having a binary dump that can be decoded in
> userspace. But I think if we want to go that way we should make that a
> separate interface. USB provides something like that, which can then be
> fed to libpcap or wireshark to capture and analyze USB traffic. If done
> properly you get replay functionality for free. I don't know what infra-
> structure exists to help with implementing something similar.

It's not actually binary. I think I misrepresented the suggestion.

I'm suggesting that we'd display only the contents of command FIFO and
contents of gathers (i.e. all opcodes) in hex format, not decoded. All
other text would remain as is, so syncpt values, etc would be readable
by a glance.

The user space tool can then take the streams and decode them if needed.

We've noticed that the decoded opcodes format can be very long and
sometimes takes a minute to dump out via a slow console. The hex output
is much more compact and faster to dump.

Actual tracing or wireshark kind of capability would come via decoding
the ftrace log. When enabled, everything that is written to the channel,
is also written to ftrace.

>>>> +static void show_channel_word(struct output *o, int *state, int *count,
>>>> +  u32 addr, u32 val, struct host1x_cdma *cdma)
>>>> +{
>>>> +  static int start_count, dont_print;
>>>
>>> What if two processes read debug information at the same time?
>>
>> show_channels() acquires cdma.lock, so that shouldn't happen.
> 
> Okay. Another solution would be to pass around a debug context which
> keeps track of the variables. But if we opt for a more involved dump
> interface as discussed above this will no longer be relevant.

Actually, debugging process needs cdma.lock, because it goes through the
cdma queue. Also command FIFO dumping is something that must be done by
a single thread at a time.

> Okay. In the context of a channel dump interface this may not be
> relevant anymore. Can you think of any issue that wouldn't be detectable
> or debuggable by analyzing a binary dump of the data within a channel?
> I'm asking because at that point we wouldn't be able to access any of
> the in-kernel data structures but would have to rely on the data itself
> for diagnostics. IOMMU virtual addresses won't be available and so on.

In many cases, looking at syncpt values, and channel state
(active/waiting on a syncpt, etc) gives an indication on what is the
current state of hardware. But, very often problems are ripple effects
on something that happened earlier and the job that caused the problem
has already been freed and is not visible in the dump.

To get a full history, we need often need the ftrace log.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-06 Thread Terje Bergström
On 06.02.2013 12:38, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
>> This was done purely, because I'm hiding the struct size from the
>> caller. If the caller needs to allocate, I need to expose the struct in
>> a header, not just a forward declaration.
> 
> I don't think we need to hide the struct from the caller. This is all
> host1x internal. Even if a host1x client uses the struct it makes little
> sense to hide it. They are all part of the same code base so there's not
> much to be gained by hiding the structure definition.

I agree, and will change.

>> Ok, I'll add the wrapper, and I'll check if passing struct host1x *
>> would make sense. In effect that'd render struct host1x_intr mostly
>> unused, so how about if we just merge the contents of host1x_intr to host1x?
> 
> We can probably do that. It might make some sense to keep it in order to
> scope the related fields but struct host1x isn't very large yet, so I
> think omitting host1x_intr should be fine.

Yes, it's not very large, and it'd remove a lot of casting between
host1x and host1x_intr, so I'll just do that.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-06 Thread Terje Bergström
On 05.02.2013 00:42, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
>> host1x_get_host() actually needs that, so this #include should've also
>> been in previous patch.
> 
> No need to if you pass struct device * instead. You might need
> linux/device.h instead, though.

Can do.

> Another variant would be host1x_syncpt_irq() for the top-level handler
> and something host1x_handle_syncpt() to handle individual syncpoints. I
> like this one best, but this is pure bike-shedding and there's nothing
> technically wrong with the names you chose, so I can't really object if
> you want to stick to them.

I could use these names. They sound logical to me,too.

> 
>>>> + queue_work(intr->wq, &sp->work);
>>>
>>> Should the call to queue_work() perhaps be moved into
>>> host1x_intr_syncpt_thresh_isr().
>>
>> I'm not sure, either way would be ok to me. The current structure allows
>> host1x_intr_syncpt_thresh_isr() to only take one parameter
>> (host1x_intr_syncpt). If we move queue_work, we'd also need to pass
>> host1x_intr.
> 
> I think I'd still prefer to have all the code in one function because it
> make subsequent modification easier and less error-prone.

Ok, I'll do that change.

> Yeah, in that case I think we should bail out. It's not like we're
> expecting any failures. If the interrupt cannot be requested, something
> must seriously be wrong and we should tell users about it so that it can
> be fixed. Trying to continue on a best effort basis isn't useful here, I
> think.

Yep, I agree.

>> In patch 3, at submit time we first allocate waiter, then take
>> submit_lock, write submit to channel, and add the waiter while having
>> the lock. I did this so that I host1x_intr_add_action() can always
>> succeed. Otherwise I'd need to write another code path to handle the
>> case where we wrote a job to channel, but we're not able to add a
>> submit_complete action to it.
> 
> Okay. In that case why not allocate it on the stack in the first place
> so you don't have to bother with allocations (and potential failure) at
> all? The variable doesn't leave the function scope, so there shouldn't
> be any issues, right?

The submit code in patch 3 allocates a waiter, and the waiter outlives
the function scope. That waiter will clean up job queue once a job is
complete.

> Or if that doesn't work it would still be preferable to allocate memory
> in host1x_syncpt_wait() directly instead of going through the wrapper.

This was done purely, because I'm hiding the struct size from the
caller. If the caller needs to allocate, I need to expose the struct in
a header, not just a forward declaration.

>>>> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
>>>
>>> Maybe you should keep the type of the irq_sync here so that it properly
>>> propagates to the call to devm_request_irq().
>>
>> I'm not sure what you mean. Do you mean that I should use unsigned int,
>> as that's the type used in devm_request_irq()?
> 
> Yes.

Ok, will do.

>>>> +void host1x_intr_stop(struct host1x_intr *intr)
>>>> +{
>>>> + unsigned int id;
>>>> + struct host1x *host1x = intr_to_host1x(intr);
>>>> + struct host1x_intr_syncpt *syncpt;
>>>> + u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
>>>> +
>>>> + mutex_lock(&intr->mutex);
>>>> +
>>>> + host1x->intr_op.disable_all_syncpt_intrs(intr);
>>>
>>> I haven't commented on this everywhere, but I think this could benefit
>>> from a wrapper that forwards this to the intr_op. The same goes for the
>>> sync_op.
>>
>> You mean something like "host1x_disable_all_syncpt_intrs"?
> 
> Yes. I think that'd be useful for each of the op functions. Perhaps you
> could even pass in a struct host1x * to make calls more uniform.

Ok, I'll add the wrapper, and I'll check if passing struct host1x *
would make sense. In effect that'd render struct host1x_intr mostly
unused, so how about if we just merge the contents of host1x_intr to host1x?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver

2013-02-06 Thread Terje Bergström
On 04.02.2013 23:43, Thierry Reding wrote:
> My point was that you could include the call to host1x_syncpt_reset()
> within host1x_syncpt_init(). That will keep unneeded code out of the
> host1x_probe() function. Also you don't want to use the syncpoints
> uninitialized, right?

Of course, sorry, I misunderstood. That makes a lot of sense.

 + */
 +static u32 syncpt_load_min(struct host1x_syncpt *sp)
 +{
 + struct host1x *dev = sp->dev;
 + u32 old, live;
 +
 + do {
 + old = host1x_syncpt_read_min(sp);
 + live = host1x_sync_readl(dev,
 + HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
 + } while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
>>>
>>> I think this warrants a comment.
>>
>> Sure. It just loops in case there's a race writing to min_val.
> 
> Oh, I see. That'd make a good comment. Is the cast to (u32) really
> necessary?

I'll add a comment. atomic_cmpxchg returns a signed value, so I think
the cast is needed.

> Save/restore has the disadvantage of the direction not being implicit.
> Save could mean save to hardware or save to software. The same is true
> for restore. However if the direction is clearly defined, save and
> restore work for me.
> 
> Maybe the comment could be changed to be more explicit. Something like:
> 
>   /*
>* Write cached syncpoint and waitbase values to hardware.
>*/
> 
> And for host1x_syncpt_save():
> 
>   /*
>* For client-managed registers, update the cached syncpoint and
>* waitbase values by reading from the registers.
>*/

I was using save in the same way as f.ex. i915 (i915_suspend.c): save
state of hardware to RAM, restore state from RAM. I'll add proper
comments, but save and restore are for all syncpts, not only client managed.

> 
 +/*
 + * Updates the last value read from hardware.
 + */
 +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
 +{
 + u32 val;
 + val = sp->dev->syncpt_op.load_min(sp);
 + trace_host1x_syncpt_load_min(sp->id, val);
 +
 + return val;
 +}
> Maybe the function should be called host1x_syncpt_load() if there is no
> equivalent way to load the maximum value (since there is no register to
> read from).

Sounds good. Maximum is just a software concept.

> That's certainly true for interrupts. However, if you look at the DMA
> subsystem for example, you can also request an unnamed resource.
> 
> The difference is sufficiently subtle that host1x_syncpt_allocate()
> would work for me too, though. I just have a slight preference for
> host1x_syncpt_request().

I don't really have a strong preference, so I'll follow your suggestion.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-04 Thread Terje Bergström
On 04.02.2013 04:56, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
>> @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
>>
>>  static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>  {
>> - return 0;
>> + struct host1x_drm_fpriv *fpriv;
>> + int err = 0;
> 
> Can be dropped.
> 
>> +
>> + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>> + if (!fpriv)
>> + return -ENOMEM;
>> +
>> + INIT_LIST_HEAD(&fpriv->contexts);
>> + filp->driver_priv = fpriv;
>> +
>> + return err;
> 
> return 0;

Ok.

> 
>> +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
>> +{
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
>> + struct host1x_drm_context *context, *tmp;
>> +
>> + list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
>> + context->client->ops->close_channel(context);
>> + kfree(context);
>> + }
>> + kfree(fpriv);
>>  }
> 
> Maybe you should add host1x_drm_context_free() to wrap the loop
> contents?

Makes sense. Will do.

> 
>> @@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm)
>>   drm_fbdev_cma_restore_mode(host1x->fbdev);
>>  }
>>
>> +static int
>> +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
> 
> static int and function name on one line, please.

Ok, will re-split the lines.

> 
>> +{
>> + struct host1x *host1x = drm->dev_private;
>> + struct tegra_drm_syncpt_read_args *args = data;
>> + struct host1x_syncpt *sp =
>> + host1x_syncpt_get_bydev(host1x->dev, args->id);
> 
> I don't know if we need this, except maybe to work around the problem
> that we have two different structures named host1x. The _bydev() suffix
> is misleading because all you really do here is obtain the syncpt from
> the host1x.

Yeah, it's actually working around the host1x duplicate naming.
host1x_syncpt_get takes struct host1x as parameter, but that's different
host1x than in this code.

> 
>> +static int
>> +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_open_channel_args *args = data;
>> + struct host1x_client *client;
>> + struct host1x_drm_context *context;
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> + struct host1x *host1x = drm->dev_private;
>> + int err = 0;
> 
> err = -ENODEV; (see below)

Ok, makes sense.

> 
>> +
>> + context = kzalloc(sizeof(*context), GFP_KERNEL);
>> + if (!context)
>> + return -ENOMEM;
>> +
>> + list_for_each_entry(client, &host1x->clients, list) {
>> + if (client->class == args->class) {
>> + context->client = client;
>> + err = client->ops->open_channel(client, context);
>> + if (err)
>> + goto out;
>> +
>> + list_add(&context->list, &fpriv->contexts);
>> + args->context = (uintptr_t)context;
> 
> Perhaps cast this to __u64 directly instead? There's little sense in
> taking the detour via uintptr_t.

I think compiler complained about a direct cast to __u64, but I'll try
again.

> 
>> + goto out;
> 
> return 0;
> 
>> + }
>> + }
>> + err = -ENODEV;
>> +
>> +out:
>> + if (err)
>> + kfree(context);
>> +
>> + return err;
>> +}
> 
> Then this simply becomes:
> 
> kfree(context);
> return err;

Sounds good.

> 
>> +static int
>> +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_open_channel_args *args = data;
>> + struct host1x_drm_context *context, *tmp;
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> + int err = 0;
>> +
>> + list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
>> + if ((uintptr_t)context == args->context) {
>> + context->client->ops->close_channel(context);
>> + list_del(&context->list);
>> + kfree(context);
>> + goto out;
>> + }
>> + }
>> + err = -EINVAL;
>> +
>> +out:
>> + return err;
>> +}
> 
> Same comments as for tegra_drm_ioctl_open_channel().

Ok, will apply.

> 
>> +static int
>> +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_get_channel_param_args *args = data;
>> + struct host1x_drm_context *context;
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>>

Re: [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:26, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
>> Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
>> of 2D clock to that driver alias.
>>
>> Signed-off-by: Terje Bergstrom 
>> ---
>>  arch/arm/mach-tegra/board-dt-tegra20.c|1 +
>>  arch/arm/mach-tegra/board-dt-tegra30.c|1 +
>>  arch/arm/mach-tegra/tegra20_clocks_data.c |2 +-
>>  arch/arm/mach-tegra/tegra30_clocks_data.c |1 +
>>  4 files changed, 4 insertions(+), 1 deletion(-)
> 
> With Prashant's clock rework patches now merged this patch can be
> dropped.

Yes, and I'll also need to start calling 2D clock with name NULL in gr2d.c.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:08, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
>> index 697d49a..ffc8bf1 100644
>> --- a/drivers/gpu/host1x/Makefile
>> +++ b/drivers/gpu/host1x/Makefile
>> @@ -12,4 +12,10 @@ host1x-y = \
>>  hw/host1x01.o
>>  
>>  host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
>> +
>> +ccflags-y += -Iinclude/drm
>> +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>> +
>> +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
>> +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
>>  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
> 
> Can this be moved into a separate Makefile in the drm subdirectory?

I tried, and kernel build helpfully created two .ko files. As having
cyclic dependencies between two modules isn't nice, I merged them to
same module and that seemed to force merging Makefile.

If anybody has an idea on how to do it otherwise, I'd be happy to keep
the Makefiles separate.

> 
>> diff --git a/drivers/gpu/host1x/host1x_client.h 
>> b/drivers/gpu/host1x/host1x_client.h
> [...]
>> new file mode 100644
>> index 000..fdd2920
>> --- /dev/null
>> +++ b/drivers/gpu/host1x/host1x_client.h
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Copyright (c) 2013, NVIDIA Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef HOST1X_CLIENT_H
>> +#define HOST1X_CLIENT_H
>> +
>> +struct platform_device;
>> +
>> +void host1x_set_drm_data(struct platform_device *pdev, void *data);
>> +void *host1x_get_drm_data(struct platform_device *pdev);
>> +
>> +#endif
> 
> These aren't defined or used yet.

Hmm, right, they would go to patch 7.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:03, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> [...]
>> +static pid_t host1x_debug_null_kickoff_pid;
>> +unsigned int host1x_debug_trace_cmdbuf;
>> +
>> +static pid_t host1x_debug_force_timeout_pid;
>> +static u32 host1x_debug_force_timeout_val;
>> +static u32 host1x_debug_force_timeout_channel;
> 
> Please group static and non-static variables.

Will do.

> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> [...]
>> +struct output {
>> +void (*fn)(void *ctx, const char *str, size_t len);
>> +void *ctx;
>> +char buf[256];
>> +};
> 
> Do we really need this kind of abstraction? There really should be only
> one location where debug information is obtained, so I don't see a need
> for this.

This is used by debugfs code to direct to debugfs, and
nvhost_debug_dump() to send via printk.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>>  struct host1x_syncpt_ops {
>>  void (*reset)(struct host1x_syncpt *);
>>  void (*reset_wait_base)(struct host1x_syncpt *);
>> @@ -117,6 +133,7 @@ struct host1x {
>>  struct host1x_channel_ops channel_op;
>>  struct host1x_cdma_ops cdma_op;
>>  struct host1x_pushbuffer_ops cdma_pb_op;
>> +struct host1x_debug_ops debug_op;
>>  struct host1x_syncpt_ops syncpt_op;
>>  struct host1x_intr_ops intr_op;
> 
> Again, better to pass in a const pointer to the ops structure.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
>> b/drivers/gpu/host1x/hw/debug_hw.c
> 
>> +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
>> *count)
>> +{
>> +unsigned mask;
>> +unsigned subop;
>> +
>> +switch (val >> 28) {
>> +case 0x0:
> 
> These can easily be derived by looking at the debug output, but it may
> still make sense to assign symbolic names to them.

I have another suggestion. In downstream I removed the decoding part and
I just print out a string of hex. That removes quite a bit bunch of code
from kernel. It makes the debug output also more compact.

It's much easier to write a user space program to decode than maintain
it in kernel.

> 
>> +static void show_channel_word(struct output *o, int *state, int *count,
>> +u32 addr, u32 val, struct host1x_cdma *cdma)
>> +{
>> +static int start_count, dont_print;
> 
> What if two processes read debug information at the same time?

show_channels() acquires cdma.lock, so that shouldn't happen.

> 
>> +static void do_show_channel_gather(struct output *o,
>> +phys_addr_t phys_addr,
>> +u32 words, struct host1x_cdma *cdma,
>> +phys_addr_t pin_addr, u32 *map_addr)
>> +{
>> +/* Map dmaget cursor to corresponding mem handle */
>> +u32 offset;
>> +int state, count, i;
>> +
>> +offset = phys_addr - pin_addr;
>> +/*
>> + * Sometimes we're given different hardware address to the same
>> + * page - in these cases the offset will get an invalid number and
>> + * we just have to bail out.
>> + */
> 
> Why's that?

Because of a race - memory might've been unpinned and unmapped from
IOMMU and when we re-map (pin), we are given a new address.

But, I think this comment is a bit stale - we used to dump also old
gathers. The latest code only dumps jobs in sync queue, so the race
shouldn't happen.

> 
>> +map_addr = host1x_memmgr_mmap(mem);
>> +if (!map_addr) {
>> +host1x_debug_output(o, "[could not mmap]\n");
>> +return;
>> +}
>> +
>> +/* Get base address from mem */
>> +sgt = host1x_memmgr_pin(mem);
>> +if (IS_ERR(sgt)) {
>> +host1x_debug_output(o, "[couldn't pin]\n");
>> +host1x_memmgr_munmap(mem, map_addr);
>> +return;
>> +}
> 
> Maybe you should stick with one of "could not" or "couldn't". Makes it
> easier to search for.

I prefer "could not", so I'll use that.

> 
>> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
>> +{
>> +struct host1x_job *job;
>> +
>> +list_for_each_entry(job, &cdma->sync_queue, list) {
>> +int i;
>> +host1x_debug_output(o,
>> +"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
>> +" first_get=%08x, timeout=%d"
>> +" num_slots=%d, num_handles=%d\n",
>> +job,
>> +job->syncpt_id,
>> +job->syncpt_end,
>> +job->first_get,
>> +job->timeout,
>> +job->num_slots,
>> +job->num_unpins);
> 
> This could go on fewer lines.

Yes, will merge.

> 
>> +static void host1x_debug_show_channel_cdma(struct host1x *m,
>> +struct host1x_channel *ch, s

Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-04 Thread Terje Bergström
On 04.02.2013 02:30, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> [...]
>> @@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
>>
>>   /* set common host1x device data */
>>   platform_set_drvdata(dev, host);
>> -
>>   host->regs = devm_request_and_ioremap(&dev->dev, regs);
>>   if (!host->regs) {
>>   dev_err(&dev->dev, "failed to remap host registers\n");
> 
> This seems an unrelated (and actually undesirable) change.
> 
>> @@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev)
>>   }
>>
>>   err = host1x_syncpt_init(host);
>> - if (err)
>> + if (err) {
>> + dev_err(&dev->dev, "failed to init sync points");
>>   return err;
>> + }
> 
> This error message should probably have gone in the previous patch as
> well.

Oops, will move these to previous patch. I'm pretty sure I already fixed
this once. :-(

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
>> index d8f5979..8376092 100644
>> --- a/drivers/gpu/host1x/dev.h
>> +++ b/drivers/gpu/host1x/dev.h
>> @@ -17,11 +17,12 @@
>>  #ifndef HOST1X_DEV_H
>>  #define HOST1X_DEV_H
>>
>> +#include 
>>  #include "syncpt.h"
>> +#include "intr.h"
>>
>>  struct host1x;
>>  struct host1x_syncpt;
>> -struct platform_device;
> 
> Why include platform_device.h here?

host1x_get_host() actually needs that, so this #include should've also
been in previous patch.

> 
>> @@ -34,6 +35,18 @@ struct host1x_syncpt_ops {
>>   const char * (*name)(struct host1x_syncpt *);
>>  };
>>
>> +struct host1x_intr_ops {
>> + void (*init_host_sync)(struct host1x_intr *);
>> + void (*set_host_clocks_per_usec)(
>> + struct host1x_intr *, u32 clocks);
> 
> Could the above two not be combined? The only reason to keep them
> separate would be if the host1x clock was dynamically changed, but I
> don't think we support that, right?

I've left this as a placeholder to at some point start supporting host1x
clock scaling. But I don't think we're going to do that for a while, so
I could merge them.

> 
>> + void (*set_syncpt_threshold)(
>> + struct host1x_intr *, u32 id, u32 thresh);
>> + void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
>> + void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
>> + void (*disable_all_syncpt_intrs)(struct host1x_intr *);
> 
> Can disable_all_syncpt_intrs() not be implemented generically using the
> number of syncpoints as exposed by host1x_device_info and the
> .disable_syncpt_intr() function?

disable_all_syncpt_intrs() disables all interrupts in one write (or one
per 32 sync points), so it's more efficient.

> 
>> @@ -46,11 +59,13 @@ struct host1x_device_info {
>>  struct host1x {
>>   void __iomem *regs;
>>   struct host1x_syncpt *syncpt;
>> + struct host1x_intr intr;
>>   struct platform_device *dev;
>>   struct host1x_device_info info;
>>   struct clk *clk;
>>
>>   struct host1x_syncpt_ops syncpt_op;
>> + struct host1x_intr_ops intr_op;
> 
> I think carrying a const pointer to the interrupt operations structure
> is a better option here.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/intr_hw.c 
>> b/drivers/gpu/host1x/hw/intr_hw.c
> [...]
>> +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt 
>> *syncpt);
> 
> Can we avoid this forward declaration?

I think we can, if I just move the isr to top of file.

> 
>> +static void syncpt_thresh_cascade_fn(struct work_struct *work)
> 
> syncpt_thresh_work()?

Sounds good.

> 
>> +{
>> + struct host1x_intr_syncpt *sp =
>> + container_of(work, struct host1x_intr_syncpt, work);
>> + host1x_syncpt_thresh_fn(sp);
> 
> Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
> Why do we need to go through an external function declaration?

If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do
that. That'd simplify the interrupt path.

> 
>> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
>> +{
>> + struct host1x *host1x = dev_id;
>> + struct host1x_intr *intr = &host1x->intr;
>> + unsigned long reg;
>> + int i, id;
>> +
>> + for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
>> + reg = host1x_sync_readl(host1x,
>> + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
>> + i * REGISTER_STRIDE);
>> + for_each_set_bit(id, ®, BITS_PER_LONG) {
>> + struct host1x_intr_syncpt *sp =
>> + intr->syncpt + (i * BITS_PER_LONG + id);
>> + host1x_intr_syncpt_thresh_isr(sp);
> 
> Have you considered mimicking the IRQ API and name this something like
> host1x_intr_syncpt_thresh_handle() and name the actual ISR just
> syncpt_thresh_isr()? Not so importan

Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver

2013-02-04 Thread Terje Bergström
On 04.02.2013 01:09, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
>> Add host1x, the driver for host1x and its client unit 2D.
> 
> Maybe this could be a bit more verbose. Perhaps describe what host1x is.

Sure. I could just steal the paragraph from Stephen:

The Tegra host1x module is the DMA engine for register access to Tegra's
graphics- and multimedia-related modules. The modules served by host1x
are referred to as clients. host1x includes some other  functionality,
such as synchronization.

>> diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
> [...]
>> @@ -0,0 +1,6 @@
>> +config TEGRA_HOST1X
>> + tristate "Tegra host1x driver"
>> + help
>> +   Driver for the Tegra host1x hardware.
> 
> Maybe s/Tegra/NVIDIA Tegra/?

Sounds good.

>> +
>> +   Required for enabling tegradrm.
> 
> This should probably be dropped. Either encode such knowledge as
> explicit dependencies or in this case just remove it altogether since we
> will probably merge both drivers anyway.

I think this was left from previous versions. Now it just doesn't make
sense. I'll just drop it.

> 
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> [...]
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "dev.h"
> 
> Maybe add a blank line between the previous two lines to visually
> separate standard Linux includes from driver-specific ones.

Ok. You commented in quite few places in a similar way. I'll fix all of
them to first include system includes, then driver's own includes, and
add a blank line in between.

> 
>> +#include "hw/host1x01.h"
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include 
>> +
>> +#define DRIVER_NAME  "tegra-host1x"
> 
> You only ever use this once, so maybe it can just be dropped?

Yes.

> 
>> +static struct host1x_device_info host1x_info = {
> 
> Perhaps this should be host1x01_info in order to match the hardware
> revision? That'll avoid it having to be renamed later on when other
> revisions start to appear.

Ok, will do. I thought it'd be awkward being alone until the second
version appears, but I'll add it.

> 
>> +static int host1x_probe(struct platform_device *dev)
>> +{
> [...]
>> + syncpt_irq = platform_get_irq(dev, 0);
>> + if (IS_ERR_VALUE(syncpt_irq)) {
> 
> This is somewhat unusual. It should be fine to just do:
> 
> if (syncpt_irq < 0)
> 
> but IS_ERR_VALUE() should work fine too.

I'll use the simpler version.

> 
>> + memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
> 
> Why not make the .info field a pointer to struct host1x_device_info
> instead? That way you don't have to keep separate copies of the same
> information.

This had something to do with __init data and non-init data. But, we're
not really putting this data into __init, so we should be able to use
just a pointer.

> 
>> + /* set common host1x device data */
>> + platform_set_drvdata(dev, host);
>> +
>> + host->regs = devm_request_and_ioremap(&dev->dev, regs);
>> + if (!host->regs) {
>> + dev_err(&dev->dev, "failed to remap host registers\n");
>> + return -ENXIO;
>> + }
> 
> This should probably be rewritten as:
> 
> host->regs = devm_ioremap_resource(&dev->dev, regs);
> if (IS_ERR(host->regs))
> return PTR_ERR(host->regs);
> 
> Though that function will only be available in 3.9-rc1.

Ok, 3.9-rc1 is fine as a basis.

>> + err = host1x_syncpt_init(host);
>> + if (err)
>> + return err;
> [...]
>> + host1x_syncpt_reset(host);
> 
> Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why
> it might be useful to have host1x_syncpt_reset() as a separate function
> but couldn't it be called as part of host1x_syncpt_init()?

host1x_syncpt_init() is used for initializing the syncpt structures, and
is called in probe. host1x_syncpt_reset() should be called whenever we
think hardware state is lost, for example if VDD_CORE was rail gated due
to system suspend.

> 
>> + dev_info(&dev->dev, "initialized\n");
> 
> I don't think this is very useful. We should make sure to tell people
> when things fail. When everything goes as planned we don't need to brag
> about it =)

True. I wish other kernel drivers followed that same philosophy. :-)
I'll remove that. It's mainly useful as debug help, but it's as easy to
check from sysfs the state.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>> +struct host1x_syncpt_ops {
> [...]
>> + const char * (*name)(struct host1x_syncpt *);
> 
> Why do we need this? Could we not refer to the syncpt name directly
> instead of going through this wrapper? I'd expect the name to be static.

This must be a relic. I'll remove the wrapper.

> 
>> +struct host1x_device_info {
> 
> Maybe this should be called simply host1x_info? _device seems redundant.

Sure.

> 
>> + int  

Re: [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware

2013-01-22 Thread Terje Bergström
On 15.01.2013 13:43, Terje Bergstrom wrote:
> This set of patches adds support for Tegra20 and Tegra30 host1x and
> 2D. It is based on linux-next-20130114. The set was regenerated with
> git format-patch -M.

I have pushed both the kernel patches and libdrm changes to
g...@gitorious.org:linux-host1x/linux-host1x.git and
g...@gitorious.org:linux-host1x/libdrm-host1x.git.

They're not intended to compete with any other repository - I just
wanted to have one place where people can download the kernel patches,
libdrm changes and test suite. I'll remove them once they've served
their purpose.

I'd appreciate feedback on the patches. So far the only feedback has
been from Stephen about clock changes.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 7/8] ARM: tegra: Add board data and 2D clocks

2013-01-16 Thread Terje Bergström
On 15.01.2013 20:44, Stephen Warren wrote:
> On 01/15/2013 04:26 AM, Terje Bergstrom wrote:
>> Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
>> of 2D clock to that driver alias.
> 
> FYI on this one patch - it won't be applied to the Tegra tree until
> after Prashant's common clock framework changes are applied. As such, it
> will need some rework once those patches are applied, or perhaps won't
> even be relevant any more; see below.

It looks like I need to just drop the patch 7(8 "ARM: tegra: Add board
data and 2D clocks" and change the 2D code to access the clock without a
name. Do you agree?

I'll roll this into the next patch version once I've gathered more comments.

diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
index dc7d6c6..a1c5f5c 100644
--- a/drivers/gpu/host1x/drm/gr2d.c
+++ b/drivers/gpu/host1x/drm/gr2d.c
@@ -259,7 +259,7 @@ static int gr2d_probe(struct platform_device *dev)
if (!gr2d)
return -ENOMEM;

-   gr2d->clk = devm_clk_get(&dev->dev, "gr2d");
+   gr2d->clk = devm_clk_get(&dev->dev, NULL);
if (IS_ERR(gr2d->clk)) {
dev_err(&dev->dev, "cannot get clock\n");
return PTR_ERR(gr2d->clk);

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 7/8] ARM: tegra: Add board data and 2D clocks

2013-01-15 Thread Terje Bergström
On 15.01.2013 20:44, Stephen Warren wrote:
>> diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c 
>> b/arch/arm/mach-tegra/board-dt-tegra20.c
> 
>> +OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x5414, "gr2d", NULL),
> 
> I assume the only reason to add AUXDATA is to give the device a specific
> name, which will then match the driver name in the clock driver:
> 
>> -CLK_DUPLICATE("2d", "tegra_grhost", "gr2d"),
>> +CLK_DUPLICATE("2d", "gr2d", "gr2d"),
> 
> If so, this shouldn't be needed once the common clock framework patches
> are applied, since all device clocks will be retrieved from device tree,
> and hence the device name will be irrelevant; the phandle in device tree
> is all that will matter.

Yes, clock binding is the only reason for the OF_DEV_AUXDATA line. I'll
need to look into Prashant's clock changes, but I assume it's going to
be a trivial change to host1x patches.

Thanks for the heads-up.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-15 Thread Terje Bergström
On 15.01.2013 13:30, Thierry Reding wrote:
> Sorry for not getting back to you on this earlier. I just remembered
> this thread when I saw Terje's latest patch series.
> 
> I agree that having everything in one location will make things a lot
> easier, even if it means we have to add the tegra-drm driver to a new
> location. In the long run I think this will pay off, though.
> 
> That said, I see that Terje has chosen this approach in his latest
> series, so it's all good.

*whew* Thanks Thierry. I'm not entirely sure that drivers/gpu/host1x is
the correct location, though - host1x looks pretty lonely there. If
anybody has a strong opinion about the location, I'm willing to adjust.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-07 Thread Terje Bergström
On 04.01.2013 22:25, Stephen Warren wrote:
> On 01/04/2013 03:09 AM, Terje Bergström wrote:
> ...
>> I think we have now two ways to go forward with cons and pros:
>>  1) Keep host1x and tegra-drm as separate driver
>>+ Code almost done
>>- we need dummy device and dummy driver
>>- extra code and API when host1x creates dummy device and its passed
>> to tegra-drm
> 
> Just to play devil's advocate:
> 
> I suspect that's only a few lines of code.

Yes, that's true. There's some overhead, but there's not too many actual
code lines.

>>- tegra-drm device would need to be a child of host1x device. Having
>> virtual and real devices as host1x children sounds weird.
> 
> And I doubt that would cause problems.

True. It could become a problem if the driver just assumed that all
host1x children are actual hardware, but we could avoid that.

>>  2) Merge host1x and tegra-drm into one module. drm is a subcomponent,
>> and whatever other personalities we wish would also be subcomponents of
>> host1x. host1x calls tegra-drm directly to handle preparation for drm
>> initialization. As they're in the same module, circular dependency is ok.
>>+ Simpler conceptually (no dummy device/driver)
>>+ Less code
>>- Proposal doesn't yet exist
> 
> But that said, I agree this approach would be very reasonable; it seems
> to me that host1x really is the main HW behind a DRM driver or a V4L2
> driver or ... As such, it seems quite reasonable for a single struct
> device to exist that represents host1x, and for the driver for that
> device to register both a DRM and a V4L2 driver etc. The code could
> physically be organized into separate modules, and under different
> Kconfig options for configurability etc.
> 
> But either way, I'll let you (Thierry and Terje) work out which way to go.

If we want separate modules, we'd need the dummy device & dummy driver
binding between them. We could also just put them in the same module.
It'd make DRM a requirement to host1x driver, but given the current
structure, I think that'd be reasonable. We could later make it more
configurable if needed. Does this now make tegra-drm and host1x too
dependent on each other? I'm not sure.

I also like the fact that we don't have to export APIs to include, but
we can (if we so choose) keep all tegra-drm-host1x-APIs in local header
files. As we have noted, the two drivers are tightly interconnected,
changing the APIs is easier if we can just work on the same directory
hierarchy.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-04 Thread Terje Bergström
On 21.12.2012 23:19, Stephen Warren wrote:
> I see the situation more like:
> 
> * There's host1x hardware.
> 
> * There's a low-level driver just for host1x itself; the host1x driver.
> 
> * There's a high-level driver for the entire host1x complex of devices.
> That is tegradrm. There may be more high-level drivers in the future
> (e.g. v4l2 camera driver if it needs to aggregate a bunch of host1x
> sub-devices liek tegradrm does).
> 
> * The presence of the host1x DT node logically implies that both the two
> drivers in the previous two points should be instantiated.
> 
> * Linux instantiates a single device per DT node.
> 
> * So, it's host1x's responsibility to instantiate the other device(s). I
> think it's reasonable for the host1x driver to know exactly what
> features the host1x HW complex supports; raw host1x access being one,
> and the higher level concept of a tegradrm being another. So, it's
> reasonable for host1x to trigger the instantiation of tegradrm.
> 
> * If DRM core didn't stomp on the device object's drvdata (or whichever
> field it is), I would recommend not creating a dummy device at all, but
> rather having the host1x driver directly implement multiple driver
> interfaces. There are many examples like this already in the kernel,
> e.g. combined GPIO+IRQ driver, combined GPIO+IRQ+pinctrl driver, etc.

We had a phone call with Stephen yesterday, and I finally understood why
unbroken chain from DT to tegra-drm is important. The goals are to be
able to modprobe tegra-drm without ill effects, and to auto-load
tegra-drm module. I had been chasing a totally different goal.

Sorry about all the churn.

I think we have now two ways to go forward with cons and pros:
 1) Keep host1x and tegra-drm as separate driver
   + Code almost done
   - we need dummy device and dummy driver
   - extra code and API when host1x creates dummy device and its passed
to tegra-drm
   - tegra-drm device would need to be a child of host1x device. Having
virtual and real devices as host1x children sounds weird.

 2) Merge host1x and tegra-drm into one module. drm is a subcomponent,
and whatever other personalities we wish would also be subcomponents of
host1x. host1x calls tegra-drm directly to handle preparation for drm
initialization. As they're in the same module, circular dependency is ok.
   + Simpler conceptually (no dummy device/driver)
   + Less code
   - Proposal doesn't yet exist

Thierry, what do you think? I'd prefer option 2, as that keeps things
simple and still fulfills the requirements. We probably would redo the
patch "Remove redundant host1x" to actually move drm under host1x, and
adds calls from host1x to parse_dt() directly. We'd probably leave the
code otherwise mostly as it is now, so we'll drop whatever modifications
we've done so far in my proposals. You can later pick up some things
from our proposals if you want, but it's then up to you.

We could also later think about generalizing some of the list management
to belong to host1x, but that'd be a follow-up and we can decide that later.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2013-01-03 Thread Terje Bergström
On 21.12.2012 16:36, Thierry Reding wrote:
> On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
>> +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
>> +{
>> +
>> +}
>> +
> 
> This can be removed, right?

Yes, done.

> 
>> +static struct platform_driver tegra_drm_platform_driver = {
>> +.driver = {
>> +.name = "tegradrm",
> 
> This should be "tegra-drm" to match the module name.

Done.

>> -struct host1x_client;
>> +struct tegra_drm_client;
> 
> I don't see the point in renaming this. All of the devices are still
> host1x clients, right? This patch would be a whole shorter if we didn't
> rename these. None of these symbols are exported either so there's not
> much chance for them to clash with anything.

Yep, we renamed it back to make the patch smaller.

>> diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
>> new file mode 100644
>> index 000..8632f49
>> --- /dev/null
>> +++ b/include/drm/tegra_drm.h
>> @@ -0,0 +1,20 @@
>> +/*
>> + * Copyright (c) 2012, NVIDIA CORPORATION.  All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef _TEGRA_DRM_H_
>> +#define _TEGRA_DRM_H_
>> +
>> +#endif
> 
> This can be removed as well.

Removed.

I posted another proposal on how to handle initialization in tegradrm.
It removes a lot of code and relies more on platform_bus keeping track
of devices. Have you had time to look into it?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
> Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
>> Some of the issues left open:
>>  * Register definitions still use static inline. There has been a
>>debate about code style versus ability to use compiler type
>>checking and code coverage analysis. There was no conclusion, so
>>I left it as was.
> This has to be resolved before merging. Personally I'm in favour of
> keeping reg access patterns close to what is done in other parts of the
> kernel.

How about if I define inline functions, and #defines to proxy to them?
Something like:

static inline u32 host1x_sync_cfpeek_ctrl_r(void)
{
return 0x74c;
}
#define HOST1X_SYNC_CFPEEK_CTRL \
host1x_sync_cfpeek_ctrl_r()

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
{
return (v & 0x1) << 31;
}
#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \
host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v)

I'd use the macros in .c files. That way the references will look
familiar to reader of .c files, but we can still get the benefits of
compiler processing (type checking, better error codes etc) and gcov
coverage tracking.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 03.01.2013 05:31, Mark Zhang wrote:
> Sorry I didn't get it. Yes, in current design, you can pin all mem
> handles in one time but I haven't found anything related with "locking
> only once per submit".
> 
> My idea is:
> - remove "job->addr_phys"
> - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately
> - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to
> fill the "reloc_addr_phy" & "gather_addr_phys".
> 
> Anything I misunderstood?

The current design uses CMA, which makes pinning basically a no-op. When
we have IOMMU support, pinning requires calling into IOMMU. Updating
SMMU tables requires locking, and probably maintenance before SMMU code
also requires its own locked data tables. For example, preventing
duplicate pinning might require a global table of handles.

Putting all of the handles in one table allows doing duplicate detection
across cmdbuf and reloc tables. This allows pinning each buffer exactly
once, which reduces number of calls to IOMMU.

> "host1x_cma_pin_array_ids" doesn't return negative value right now, so
> maybe you need to take a look at it.

True, and also a consequence of using CMA: pinning can never fail. With
IOMMU, pinning can fail.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 02.01.2013 11:31, Mark Zhang wrote:
> On 01/02/2013 05:31 PM, Terje Bergström wrote:
>> That's intentional. Writing a job to channel is atomic, so lock is taken
>> from host1x_cdma_begin() until host1x_cdma_end().
>>
> 
> Okay. So can we consider that lock and unlock this mutex in the function
> which calls host1x_cdma_begin and host1x_cdma_end, that means, in
> current implementation, function "channel_submit"?

Yes, exactly.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 28.12.2012 11:14, Mark Zhang wrote:
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index a936902..c3ded60 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
> *context,
> if (err)
> goto fail;
> 
> +   /* We define CMA as an temporary solution in host1x driver.
> +  That's also why we have a CMA kernel config in it.
> +  But seems here, in tegradrm, we hardcode the CMA here.
> +  So what do you do when host1x change to IOMMU?
> +  Do we also need to define a CMA config in tegradrm
> driver,
> +  then after host1x changes to IOMMU, we add another IOMMU
> +  config in tegradrm? Or we should invent another more
> +  generic wrapper layer here? */
> cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
> cmdbuf.mem);

Lucas is working on host1x allocator, so let's defer this comment until
we have Lucas' code.

> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index cc8ca2f..0867b72 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
> host1x_channel *ch,
> mem += num_unpins * sizeof(dma_addr_t);
> job->pin_ids = num_unpins ? mem : NULL;
> 
> +   /* I think this is a somewhat ugly design.
> +  Actually "addr_phys" is consisted by "reloc_addr_phys" and
> +  "gather_addr_phys".
> +  Why don't we just declare "reloc_addr_phys" and
> "gather_addr_phys"?
> +  In current design, let's say if one nvhost newbie changes the
> order
> +  of these 2 blocks of code in function "pin_job_mem":
> +
> +  for (i = 0; i < job->num_relocs; i++) {
> +   struct host1x_reloc *reloc = &job->relocarray[i];
> +   job->pin_ids[count] = reloc->target;
> +   count++;
> +  }
> +
> +  for (i = 0; i < job->num_gathers; i++) {
> +   struct host1x_job_gather *g = &job->gathers[i];
> +   job->pin_ids[count] = g->mem_id;
> +   count++;
> +  }
> +
> +  Then likely something weird occurs... */

We do this because this way we can implement batch pinning of memory
handles. This way we can decrease memory handle management a lot as we
need to do locking only once per submit.

Decreasing memory management overhead is really important, because in
complex graphics cases, there are might be a hundreds of buffer
references per submit, and several submits per frame. Any extra overhead
relates directly to reduced performance.

> job->reloc_addr_phys = job->addr_phys;
> job->gather_addr_phys = &job->addr_phys[num_relocs];
> 
> @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
> }
> }
> 
> +   /* I have question here. Does this mean the address info
> +  which need to be relocated(according to the libdrm codes,
> +  seems this address is "0xDEADBEEF") always staying at the
> +  beginning of a page? */
> __raw_writel(
> (job->reloc_addr_phys[i] +
> reloc->target_offset) >> reloc->shift,

No - the slot can be anywhere. That's why we have cmdbuf_offset in the
reloc struct.

> @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
> platform_device *pdev)
> }
> }
> 
> +   /* if (host1x_firewall && !err) { */
> if (host1x_firewall) {
> err = copy_gathers(job, pdev);
> if (err) {

Will add.

> @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
> platform_device *pdev)
> }
> }
> 
> +/* Rename this label to "out" or something else.
> +   Because if everything goes right, the codes under this label also
> +   get executed. */
>  fail:
> wmb();

Will do.

> 
> diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
> index f3954f7..bb5763d 100644
> --- a/drivers/gpu/host1x/memmgr.c
> +++ b/drivers/gpu/host1x/memmgr.c
> @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
> platform_device *dev,
> count, &unpin_data[pin_count],
> phys_addr);
> 
> +   /* I don't think the current "host1x_cma_pin_array_ids"
> +  is able to return a negative value. So this "if" doesn't
> +  make sense...*/
> if (cma_count < 0) {
> /* clean up previous handles */
> while (pin_count) {

It should return negative in case of error.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" i

Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 02.01.2013 09:40, Mark Zhang wrote:
> On 12/21/2012 07:39 PM, Terje Bergstrom wrote:
>> Add support for host1x client modules, and host1x channels to submit
>> work to the clients. The work is submitted in GEM CMA buffers, so
>> this patch adds support for them.
>>
>> Signed-off-by: Terje Bergstrom 
>> ---
> [...]
>> +/*
>> + * Begin a cdma submit
>> + */
>> +int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
>> +{
>> +struct host1x *host1x = cdma_to_host1x(cdma);
>> +
>> +mutex_lock(&cdma->lock);
>> +
>> +if (job->timeout) {
>> +/* init state on first submit with timeout value */
>> +if (!cdma->timeout.initialized) {
>> +int err;
>> +err = host1x->cdma_op.timeout_init(cdma,
>> +job->syncpt_id);
>> +if (err) {
>> +mutex_unlock(&cdma->lock);
>> +return err;
>> +}
>> +}
>> +}
>> +if (!cdma->running)
>> +host1x->cdma_op.start(cdma);
>> +
>> +cdma->slots_free = 0;
>> +cdma->slots_used = 0;
>> +cdma->first_get = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
>> +
>> +trace_host1x_cdma_begin(job->ch->dev->name);
> 
> Seems missing "mutex_unlock(&cdma->lock);" here.

That's intentional. Writing a job to channel is atomic, so lock is taken
from host1x_cdma_begin() until host1x_cdma_end().

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 22.12.2012 06:17, Steven Rostedt wrote:
> On Fri, 2012-12-21 at 13:39 +0200, Terje Bergstrom wrote:
>> +TRACE_EVENT(host1x_cdma_begin,
>> +TP_PROTO(const char *name),
>> +
>> +TP_ARGS(name),
>> +
>> +TP_STRUCT__entry(
>> +__field(const char *, name)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->name = name;
>> +),
>> +
>> +TP_printk("name=%s",
>> +__entry->name)
>> +);
>> +
>> +TRACE_EVENT(host1x_cdma_end,
>> +TP_PROTO(const char *name),
>> +
>> +TP_ARGS(name),
>> +
>> +TP_STRUCT__entry(
>> +__field(const char *, name)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->name = name;
>> +),
>> +
>> +TP_printk("name=%s",
>> +__entry->name)
>> +);
> 
> The above two should be combined into a DECLARE_EVENT_CLASS() and
> DEFINE_EVENT()s. Saves text and data space that way.

Will do.

(...)
> If you can combine any of these others into DECLARE_EVENT_CLASS() and
> DEFINE_EVENT()s, please do.

I didn't find others with same parameters.

Thanks!

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 26.12.2012 11:42, Mark Zhang wrote:
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index a936902..28954b3 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
>  {
> int err;
> struct gr2d *gr2d = NULL;
> +   /* Cause drm_device is created in host1x driver. So
> +  if host1x drivers loads after tegradrm, then in this
> +  gr2d_probe function, this "drm_device" will be NULL.
> +  How can we handle this? Defer driver probe? */

I will push a new proposal about how the devices & drivers get probed.

> struct platform_device *drm_device =
> host1x_drm_device(to_platform_device(dev->dev.parent));
> struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
> @@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
> 
> gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
> if (!gr2d->syncpt) {
> +   /* Do we really need this?
> +  After "host1x_channel_alloc", the refcount of this
> +  channel is 0. So calling host1x_channel_put here will
> +  make the refcount going to negative.
> +  I suppose we should call "host1x_free_channel" here. */

True. I need to also export host1x_channel_free (I will change the name,
too).

> host1x_channel_put(gr2d->channel);
> return -ENOMEM;
> }
> @@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
> 
> err = tegra_drm_register_client(tegradrm, &gr2d->client);
> if (err)
> +   /* Add "host1x_free_channel" */

Will do.

> return err;
> 
> platform_set_drvdata(dev, gr2d);
> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
> index 3705cae..ed83b9e 100644
> --- a/drivers/gpu/host1x/channel.c
> +++ b/drivers/gpu/host1x/channel.c
> @@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
> platform_device *pdev)
> int max_channels = host1x->info.nb_channels;
> int err;
> 
> +   /* Is it necessary to add mutex protection here?
> +  I'm just wondering in a smp system, multiple host1x clients
> +  may try to alloc their channels simultaneously... */

I'll add locking.

> if (chindex > max_channels)
> return NULL;
> 
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> index 86d5c70..f2bd5aa 100644
> --- a/drivers/gpu/host1x/debug.c
> +++ b/drivers/gpu/host1x/debug.c
> @@ -164,6 +164,10 @@ static const struct file_operations
> host1x_debug_fops = {
> 
>  void host1x_debug_init(struct host1x *host1x)
>  {
> +   /* I think it's better to set this directory name the same with
> +  the driver's name -- defined in dev.c:
> +  #define DRIVER_NAME  "tegra-host1x"
> +  */
> struct dentry *de = debugfs_create_dir("tegra_host", NULL);

Will do.

> 
> if (!de)
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index 07e8813..01ed10d 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -38,6 +38,7 @@
> 
>  struct host1x *host1x;
> 
> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>  void host1x_syncpt_incr_byid(u32 id)

No, host1x_syncpt_incr_byid is SMP safe.

>  {
> struct host1x_syncpt *sp = host1x->syncpt + id;
> @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
>  }
>  EXPORT_SYMBOL(host1x_syncpt_read_byid);
> 
> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)

Same here, SMP safe.

>  {
> struct host1x_syncpt *sp = host1x->syncpt + id;
> @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)
> 
> err = host1x_alloc_resources(host);
> if (err) {
> +   /* We don't have chip_ops right now, so here the
> +  error message is somewhat improper */
> dev_err(&dev->dev, "failed to init chip support\n");
> goto fail;
> }

Actually, alloc_resources only allocates intr->syncpt, so I the code to
host1x_intr_init().

> @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
> if (!host->syncpt)
> goto fail;
> 
> +   /* I suggest create a dedicate function for initializing nop sp.
> +  First this "_host1x_syncpt_alloc" looks like an internal/static
> +  function.
> +  Then actually "host1x_syncpt_alloc" & "_host1x_syncpt_alloc" all
> +  serve host1x client(e.g: gr2d) so it's not suitable to use them
> +  for nop sp.
> +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
> +  This will make the code 

Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-01 Thread Terje Bergström
On 21.12.2012 23:19, Stephen Warren wrote:
> * There's host1x hardware.
> 
> * There's a low-level driver just for host1x itself; the host1x driver.
> 
> * There's a high-level driver for the entire host1x complex of devices.
> That is tegradrm. There may be more high-level drivers in the future
> (e.g. v4l2 camera driver if it needs to aggregate a bunch of host1x
> sub-devices liek tegradrm does).

tegradrm is a driver for a couple of the host1x clients, namely DC and
HDMI, and now adding 2D.

> * The presence of the host1x DT node logically implies that both the two
> drivers in the previous two points should be instantiated.
> 
> * Linux instantiates a single device per DT node.
> 
> * So, it's host1x's responsibility to instantiate the other device(s). I
> think it's reasonable for the host1x driver to know exactly what
> features the host1x HW complex supports; raw host1x access being one,
> and the higher level concept of a tegradrm being another. So, it's
> reasonable for host1x to trigger the instantiation of tegradrm.

tegradrm has drivers for each device that it controls, so tegradrm via
kernel device-driver model instantiates them. host1x driver doesn't need
to do that.

> * If DRM core didn't stomp on the device object's drvdata (or whichever
> field it is), I would recommend not creating a dummy device at all, but
> rather having the host1x driver directly implement multiple driver
> interfaces. There are many examples like this already in the kernel,
> e.g. combined GPIO+IRQ driver, combined GPIO+IRQ+pinctrl driver, etc.

This is the architecture that would imply host1x instantiating
everything, and DRM being a subcomponent of host1x driver. But we didn't
choose to go there. We agreed to have tegradrm and host1x drivers separate.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-30 Thread Terje Bergström
On 28.12.2012 23:21, Thierry Reding wrote:
> Instead of going over this back and forth, I've decided to rewrite this
> patch from scratch the way I think it should be done. Maybe that'll make
> things clearer. I haven't tested it on real hardware yet because I don't
> have access over the holidays, but I'll post the patch once I've
> verified that it actually works. The code is based on patches 1-4 of
> this series and is meant to replace patch 5.

I'm still not happy that host1x needs to know about drm. That's just a
wrong dependency, and wrong dependencies always end up biting back. We
need to figure out solution that satisfies both mine and your
requirements and reduce complexity.

DC/HDMI/GR2D probes are using the global data only for managing the
lists "drm_clients" and "clients". "clients" list is only required after
tegra_drm_load(). "drm_clients" is required to establish driver load order.

With dummy device, we can determine the registration (and probe) order.
tegra-drm can register the drivers for DC/HDMI/GR2D devices first, and
as last the device and driver for tegra-drm.

tegra-drm probe will allocate the global data, enumerate all drivers and
add them to the clients list. If one driver is not initialized, it'll
return with -EPROBE_DEFER and will be called again later. When all this
is successful, it'll call drm_platform_init().

The advantages:
 * No management of drm_clients list
 * No mucking with drvdata
 * host1x doesn't need to know about drm
 * The global data is allocated and deallocated by tegra-drm probe and
remove, and accessed only via drm_device->dev_private
 * Much less code

Something like the attached patch - not tested, as I don't have access
to hw now, but it shows the idea. It's based on patches 1-5 in the
series, and could replace patch 5.

Terje
>From a97d475d65e68ce37c345924171dc57c5e7729ee Mon Sep 17 00:00:00 2001
From: Terje Bergstrom 
Date: Sat, 29 Dec 2012 11:43:54 +0200
Subject: [PATCH] drm: tegra: Postpone usage of global data

Change-Id: Ibfd1d4eeb267ac185de4508a1547fb009b80e93a
Signed-off-by: Terje Bergstrom 
---
 drivers/gpu/drm/tegra/dc.c   |   18 -
 drivers/gpu/drm/tegra/drm.c  |  151 +-
 drivers/gpu/drm/tegra/drm.h  |1 -
 drivers/gpu/drm/tegra/gr2d.c |7 --
 drivers/gpu/drm/tegra/hdmi.c |   18 -
 5 files changed, 47 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 24bcd06..73056b7 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -740,7 +740,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 	struct resource *regs;
 	struct tegra_dc *dc;
 	int err;
-	struct tegradrm *tegradrm = platform_get_drvdata(pdev);
 
 	dc = devm_kzalloc(&pdev->dev, sizeof(*dc), GFP_KERNEL);
 	if (!dc)
@@ -780,7 +779,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 	INIT_LIST_HEAD(&dc->client.list);
 	dc->client.ops = &dc_client_ops;
 	dc->client.dev = &pdev->dev;
-	dc->client.tegradrm = tegradrm;
 
 	err = tegra_dc_rgb_probe(dc);
 	if (err < 0 && err != -ENODEV) {
@@ -788,13 +786,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	err = tegra_drm_register_client(tegradrm, &dc->client);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
-			err);
-		return err;
-	}
-
 	platform_set_drvdata(pdev, dc);
 
 	return 0;
@@ -803,15 +794,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 static int tegra_dc_remove(struct platform_device *pdev)
 {
 	struct tegra_dc *dc = platform_get_drvdata(pdev);
-	int err;
-
-	err = tegra_drm_unregister_client(dc->client.tegradrm,
-			&dc->client);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to unregister tegra_drm client: %d\n",
-			err);
-		return err;
-	}
 
 	clk_disable_unprepare(dc->clk);
 
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 637b621..520c281 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -25,12 +25,6 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
-struct tegra_drm_client_entry {
-	struct tegra_drm_client *client;
-	struct device_node *np;
-	struct list_head list;
-};
-
 static int tegra_drm_add_client(struct device *dev, void *data)
 {
 	static const char * const compat[] = {
@@ -42,102 +36,26 @@ static int tegra_drm_add_client(struct device *dev, void *data)
 		"nvidia,tegra30-gr2d"
 	};
 	struct tegradrm *tegradrm = data;
-	struct tegra_drm_client_entry *client;
+	struct tegra_drm_client *client;
 	unsigned int i;
 
+	/* Check if dev is a tegra-drm device, and add to client list */
 	for (i = 0; i < ARRAY_SIZE(compat); i++) {
 		if (of_device_is_compatible(dev->of_node, compat[i]) &&
 		of_device_is_available(dev->of_node)) {
-			client = kzalloc(sizeof(*client), GFP_KERNEL);
+			client = dev_get_drvdata(dev);
 			if (!client)
-return -ENOMEM;
+return -EPROBE_DEFER;
 
 			INIT_LIST_HEAD(&client->list);
-			client->np = 

Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-21 Thread Terje Bergström
On 21.12.2012 16:36, Thierry Reding wrote:
> On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
>> +static struct platform_driver tegra_drm_platform_driver = {
>> +.driver = {
>> +.name = "tegradrm",
> 
> This should be "tegra-drm" to match the module name.

We've actually created two problems.

First is that the device name should match driver name which should
match module name. But host1x doesn't know the module name of tegradrm.

Second problem is that host1x driver creates tegradrm device even if
tegradrm isn't loaded to system.

These mean that the device has to be created in tegra-drm module to have
access to the module name. So instead of just getter, we need a getter
and a setter.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
> This has to be resolved before merging. Personally I'm in favour of
> keeping reg access patterns close to what is done in other parts of the
> kernel.

I haven't so far received comments from too many people, so that's why I
left it as is.

>>  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
>>specific allocator.
> 
> I really want the allocator issue resolved before talking about merging
> those patches. Proper IOMMU support will require a bit of rework of the
> current upstream IOMMU API, so it will still be a bit out.
> 
> But if you don't mind I'll try to implement the host1x allocator over
> the holidays and provide an incremental patch.

Do you mean host1x CMA allocator using dma mapping API? That'd be great
if you could help out. I've just tried to concentrate on getting ok for
base essentials before doing any extra work.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-20 Thread Terje Bergström
On 21.12.2012 00:28, Stephen Warren wrote:
> On 12/20/2012 02:34 PM, Terje Bergström wrote:
>> On 20.12.2012 22:30, Thierry Reding wrote:
>>> The problem with your proposed solution is that, even any of Stephen's
>>> valid objections aside, it won't work. Once the tegra-drm module is
>>> unloaded, the driver's data will be left in the current state and the
>>> link to the dummy device will be lost.
>>
>> The dummy device is created by tegradrm's module init, because it's used
> 
> No, the tegradrm driver object might be registered by tegradrm's module
> init, but the dummy tegradrm platform device would need to be
> created/registered by host1x's probe. Otherwise, the device would get
> created even if there was no host1x/... in the system (disabled for some
> reason, multi-SoC kernel, ...)

Oh. I was all the time thinking that dummy device needs to be created by
tegradrm, because it's only used by tegradrm.  But as we're mixing the
responsibilities, we might then just as well go all the way.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-20 Thread Terje Bergström
On 20.12.2012 22:30, Thierry Reding wrote:
> The problem with your proposed solution is that, even any of Stephen's
> valid objections aside, it won't work. Once the tegra-drm module is
> unloaded, the driver's data will be left in the current state and the
> link to the dummy device will be lost.

The dummy device is created by tegradrm's module init, because it's used
only by tegradrm. When tegradrm is unloaded, all the tegradrm devices
and drivers are unloaded and freed, including the dummy one.

> I don't think waiting for things to fail is the right option here. If we
> can make out this problem now, then now is the time to fix it. No
> excuses.

Of course not. If we know of something that fails now, let's not do that.

> And quite frankly given how all the various host1x components are
> interlinked I think having a single function that gets the DRM dummy
> device for DRM-related clients isn't that bad.

I like clear responsibilities. tegradrm is responsible for the DRM
interface, and host1x driver is responsible for host1x. tegradrm owns
its data, and can pass its pointers to the functions that need it.

But fine, I still don't like it, but I'll add host1x_set_tegradrm() and
host1x_get_tegradrm(), which act as setter and getter for tegradrm's
global data. I still think it's a solution to a problem we don't have,
but we've wasted an order of magnitude more time debating it than it
takes to implement.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-20 Thread Terje Bergström
On 20.12.2012 19:55, Stephen Warren wrote:
> So it's fine for the tegradrm driver to manipulate the tegradrm
> (virtual) device's drvdata pointer.
> 
> It's not fine for tegradrm to manipulate the drvdata pointer of other
> devices; the host1x children. The drvdata pointer for other devices is
> reserved for those devices' driver's use only.

They're all tegradrm drivers, so tegradrm can manipulate its drvdata.
The other simple way is global pointer. Let's aim now for a simple
solution, and if later we find out it causes problems, let's fix it then.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-20 Thread Terje Bergström
On 20.12.2012 19:14, Stephen Warren wrote:
> I'm not sure that sounds right. drvdata is something that a driver
> should manage itself.
> 
> What's wrong with just having each device ask the host1x (its parent)
> for a pointer to the (dummy) tegradrm device. That seems extremely
> simple, and doesn't require abusing existing stuff like drvdata for
> non-standard purposes.

This is tegradrm's own data, and it's tegradrm which accesses the
pointer. So it's entirely something that the driver takes care of itself.

It's simplest if tegradrm takes care of its own data. That way there's
no chance of confusion of ownership or lifecycle of the data. The only
places which needs this access are the probe functions, and adding an
API contract between two components just for these few calls sounds too
much. All code after that anyway access drm_device->dev_private to get
the pointer.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-20 Thread Terje Bergström
On 16.12.2012 14:16, Thierry Reding wrote:
> Okay, so we're back on the topic of using globals. I need to assert
> again that this is not an option. If we were to use globals, then we
> could just as well leave out the dummy device and just do all of that in
> the tegra-drm driver's initialization function.

I found a way of dropping the global in a straightforward way, and
introduce dummy device for drm_platform_init().

I added dummy device and driver, and moved the tegradrm global
(previously called struct host1x *host1x) allocation to happen in the
probe. In addition, probe calls device tree node traversal to do the
tegra_drm_add_client() calls. The dummy device is owner for this global.

I changed the device tree node traversal so that it goes actually
through each host1x child, checks if it's supported by tegradrm, and if
so, sets its drvdata to point to the tegradrm data.

Each probe will add the client with tegra_drm_register_client(), and
that will find the global via dev_get_drvdata(). In the end of probe,
each driver will replace the drvdata with its own data.

I am also setting the coherent_dma_mask for dummy device so that it can
be used with CMA FB helper.

Would this be ok for you? I could send a patchset with this implemented
by tomorrow and let it simmer for 2-3 weeks due to my and everybody
elses' holidays. I'm hoping we would have 2D acceleration in Linux
kernel 3.9.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-17 Thread Terje Bergström
On 17.12.2012 22:55, Stephen Warren wrote:
> On 12/16/2012 09:37 AM, Terje Bergström wrote:
> ...
>> ... Sure we could tell DC to ask its parent
>> (host1x), and call host1x driver with platform_device pointer found that
>> way, and host1x would return a pointer to tegradrm's data. Hanging the
>> data onto host1x driver is just a more complicated way of implementing
>> global data
> 
> No it's not; at that point, the data is no longer global, but specific
> to the driver instance.

We have only one tegradrm, so the advantage is theoretical - the one
driver gets the same pointer in both cases.

If we use static pointer with an accessor function, we can keep the
solution contained to one source code file and the ownership of data is
clear - tegradrm allocates and deallocates the object and is the sole
user. Code is already in the patchset I sent.

Shared responsibility with host1x and tegradrm would work probably
something like this:

tegradrm creates an object, and passes the reference to host1x
(host1x_set_drm_pdata(host1x_platform_dev, object). host1x sets the
pdata to a member in struct host1x. A getter host1x_get_drm_pdata()
allows retrieving the object. DC would call it with
"host1x_get_drm_pdata(to_platform_device(pdev->dev.parent))".

This assumes that tegradrm would keep ownership of the data and free it
before host1x gets unloaded.

To me this sounds like a steep price and I fail to see the advantage.

Dummy device is something that would come then on top of this mechanism.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 4/7] gpu: host1x: Add debug support

2012-12-17 Thread Terje Bergström
On 13.12.2012 17:23, Joe Perches wrote:
> On Thu, 2012-12-13 at 16:04 +0200, Terje Bergstrom wrote:
>> Add support for host1x debugging. Adds debugfs entries, and dumps
>> channel state to UART in case of stuck job.
> 
> trivial note:
> 
> []
> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> []
>> +void host1x_debug_output(struct output *o, const char *fmt, ...);
> 
> This should be marked __printf(2, 3)
> so the compiler verifies format and argument types.

Thanks, I didn't know of this "trick". I'll apply it in the next version.

Considering the amount of feedback I've received from the patches, they
must be top notch quality!

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-16 Thread Terje Bergström
On 16.12.2012 14:16, Thierry Reding wrote:
> Okay, so we're back on the topic of using globals. I need to assert
> again that this is not an option. If we were to use globals, then we
> could just as well leave out the dummy device and just do all of that in
> the tegra-drm driver's initialization function.
> The whole point of all this is to link the host1x and it's particular
> tegra-drm instance. I will see if I can find the time to implement this
> in the existing driver, so that the host1x code that you want to remove
> registers the tegra-drm dummy device and the tegra-drm devices use the
> accessors as discussed previously to access host1x and the dummy device.
> Once that is implemented, removing the original host1x implementation
> should be much easier since you will only have to keep the dummy device
> instantiation along with the one or two accessor functions.

I'm not sure what you have discussed with Stephen, so I might be missing
the reason why this is a problem that needs to be solved with new
dependency between tegradrm and host1x instead of locally in tegradrm
driver itself.

As far I remember, we had two reasons for discussing the dummy device.
First reason is for DC, HDMI probe calls to find the global data. Second
is giving something to DRM framework's drm_platform_init().

The easiest way to solve the probe problem is just to have a tegradrm
accessor for the global data in the way I proposed in the patchset.
Dummy device doesn't help there, as the dummy device is in no
relationship to DC and HDMI. Sure we could tell DC to ask its parent
(host1x), and call host1x driver with platform_device pointer found that
way, and host1x would return a pointer to tegradrm's data. Hanging the
data onto host1x driver is just a more complicated way of implementing
global data, and it's breaking the responsibility split between host1x
driver and tegradrm. To me, host1x driver is responsible of host1x, and
tegradrm is responsible of the DRM interface and everything related to that.

All other parts of code use drm_device->dev_private for getting the
global data, so the access problem is only for the probe calls.

drm_platform_init() needing a device is another problem.
drm_platform_init() leads to a call to the CMA FB helper. That again
needed the coherent_dma_mask set for the device give to it. I guess that
problem can be solved by just setting the mask to 0x. But that
is still something that can be handled inside tegradrm without involving
the host1x driver.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-14 Thread Terje Bergström
On 14.12.2012 18:21, Stephen Warren wrote:
> On 12/13/2012 11:09 PM, Terje Bergström wrote:
>> They want to get the global data by getting drvdata of their parent.
> 
> There's no reason that /has/ to be so; they can get any information from
> wherever it is, rather than trying to require it to be somewhere it isn't.

Exactly, this was also my solution.

>> The dummy device is not their parent - host1x is. There's no
>> connection between the dummy and the real client devices.
> 
> It's quite possible for the client devices to ask their actual parent
> (host1x) for the tegradrm struct device *, by calling a host1x API, and
> have host1x implement that API by returning the dummy/virtual device
> that it created. That should be just 1-2 lines of code to implement in
> host1x, plus maybe a couple more to correctly return -EPROBE_DEFERRED
> when appropriate.

My solution was to just have one global, and an getter for the global.
Instead of drvdata, the pointer is retrieved with the getter. No need
for dummy device, or passing points between host1x and tegradrm.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-13 Thread Terje Bergström
On 13.12.2012 19:58, Stephen Warren wrote:
> On 12/13/2012 01:57 AM, Thierry Reding wrote:
>> After some more discussion with Stephen on IRC we came to the
>> conclusion that the easiest might be to have tegra-drm call into
>> host1x with something like:
>>
>> void host1x_set_drm_device(struct host1x *host1x, struct device
>> *dev);
> 
> If host1x is registering the dummy device that causes tegradrm to be
> instantiated, then presumably there's no need for the API above, since
> host1x will already have the struct device * for tegradrm, since it
> created it?

I didn't add the dummy device in my latest patch set. I first set out to
add it, and moved the drm global data to be drvdata of that device. Then
I noticed that it doesn't actually help at all.

The critical accesses to the global data are from probes of DC, HDMI,
etc. They want to get the global data by getting drvdata of their
parent. The dummy device is not their parent - host1x is. There's no
connection between the dummy and the real client devices. So there's no
logical way for DC and HDMI to find the the dummy device, except perhaps
by searching for "tegradrm" device from platform bus. But then again,
that'd break encapsulation so it's as bad as a global variable - only
with a lot more code to do the same thing.

Accesses after probing can happen via drm->dev_private, so post-probe
we're fine.

Another problem arouse (already mentioned it) when I used the dummy
device to call to drm_platform_init(). It called back into tegradrm,
which calls the CMA FB helper to allocate memory. Unfortunately the
memory is allocated for the dummy device, and it's not initialized to do
do that. I ended up with failed frame buffer allocation. That needs
host1x allocator to fix.

host1x itself shouldn't need any DRM specific calls or callbacks. The
device model already allows traversing through list of host1x children.
The list of drm clients and devices is something that tegradrm needs to
be able to initialize DRM at appropriate time. I also took that into use
for storing the channel and class data. So we should try to keep the
list maintenance as local to tegradrm as we can.

In my latest patch set, I kept the list management inside tegradrm, and
put all DRM global data into struct tegradrm, which is accessed via a
global. I guess global in tegradrm is not as bad as global in host1x,
because one DRM driver can handle multiple devices, so there's no reason
to have multiple tegradrm's trampling on each others data. But if you
can come up with a better solution, I'm all ears.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/7] Support for Tegra 2D hardware

2012-12-13 Thread Terje Bergström
On 13.12.2012 17:03, Lucas Stach wrote:
> You are still doing the allocation the IMHO wrong way around. I thought
> we agreed to do all the allocations in host1x, which obviously means not
> using the cma_gem_helpers anymore, but introducing a new native host1x
> object to back GEM/V4L/whatever objects. IMHO the current approach is a
> clear layering violation and makes proper IOMMU support a lot harder. It
> would also allow to get rid of all the indirections and ifdefs in host1x
> memmgr, as host1x would only have to deal with it's native objects.
> 
> All the complexity of converting host1x to GEM objects should be located
> in tegradrm and not be scattered between different modules.
> 
> Did you leave this out on purpose in this version of the patchset?

Forgot to mention that, as IOMMU and consequently the "proper"
allocation support was planned as a follow-up. I wanted to keep the
scope of this set as small as possible.

The plan we agreed on still holds.

Terje

>>  * tegradrm has a global variable. Plan was to hide that behind a
>>virtual device, and use that as DRM root device. That plan went
>>bad once the FB CMA helper used the device for trying to allocate
>>memory.
> See above, we should get rid of the helpers and do all allocations
> within host1x.

I noticed that IOMMU and not using the CMA FB helper is now exynos
managed to do this.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-13 Thread Terje Bergström
On 12.12.2012 18:08, Thierry Reding wrote:
> I've briefly discussed this with Stephen on IRC because I thought I had
> remembered him objecting to the idea of adding a dummy device just for
> this purpose. It turns out, however, that what he didn't like was to add
> a dummy node to the DT just to make this happen, but he has no (strong)
> objections to a dummy platform device.
> 
> While I'm not very happy about that solution, I've been going over it
> for a week now and haven't come up with any better alternative that
> doesn't have its own disadvantages. So perhaps we should go ahead and
> implement that. For the host1x driver this really just means creating a
> platform device and adding it to the system, with some of the fields
> tweaked to make things work.

Even the virtual device is not too beautiful. The problem is that the
virtual device is not physical parent for DC, HDMI, etc, so
dev_get_drvdata(pdev->dev.parent) returns the data from host1x device,
not the virtual device.

We'll post with something that goes around this, but it's not going to
be too pretty. Let's try to find the solution once we get the code out.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-12 Thread Terje Bergström
On 12.12.2012 18:08, Thierry Reding wrote:
> I've briefly discussed this with Stephen on IRC because I thought I had
> remembered him objecting to the idea of adding a dummy device just for
> this purpose. It turns out, however, that what he didn't like was to add
> a dummy node to the DT just to make this happen, but he has no (strong)
> objections to a dummy platform device.
> 
> While I'm not very happy about that solution, I've been going over it
> for a week now and haven't come up with any better alternative that
> doesn't have its own disadvantages. So perhaps we should go ahead and
> implement that. For the host1x driver this really just means creating a
> platform device and adding it to the system, with some of the fields
> tweaked to make things work.

This sounds ok. Considering that the DRM driver is a virtual driver,
attaching it to a virtual device sounds ok -- ish.

> Is this something that you can take care of in your patch series? I
> could also implement this on top of the current driver and then all your
> patch series would have to do is remove host1x.c from tegra-drm and
> instantiate the platform device itself.

We're about to post patches for both host1x driver and libdrm. This is
the last big rock unturned, so let me take a stab at this first. I'd
like to get the updated patches out tomorrow or on Friday, so I'll try
to either get it done for that or leave it to the TODO list.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-10 Thread Terje Bergström
On 05.12.2012 14:04, Thierry Reding wrote:
> On Wed, Dec 05, 2012 at 01:47:38PM +0200, Terje Bergström wrote:
>> You're right in that binding to a sub-device is not a nice way. DRM
>> framework just needs a "struct device" to bind to. exynos seems to solve
>> this by introducing a virtual device and bind to that. I'm not sure if
>> this is the best way, but worth considering?
> 
> That was discussed a few months back already and nobody seemed to like
> the idea. In fact it was as a result of that discussion that Stephen
> brought up the idea to register the DRM driver from a central host1x
> driver (it may also have been part of a discussion on IRC, I don't
> remember exactly).
> 
> At the time I spent some time on a patch that introduced drm_soc_init()
> to solve this by creating a dummy struct device and registering the
> driver on top of that. But I abandoned it in favour of fixing the DRM
> platform support code. The approach also didn't provide for the proper
> encapsulation.

I've managed to go through all the other feedback and implement a
solution to most of them, so now I have some slack to actually think
about the initialization. Sorry about this, but you (meaning all the
reviewers) did give us a _lot_ to do. :-) Fortunately, the driver
actually became a lot better, too.

Back to the topic of tegradrm init. The root cause of the problem is
that DRM framework needs some device to assign itself to. The problem is
that this device doesn't have any physical counterpart, as it's only for
storing a pointer in DRM framework. Please correct me if this is wrong.

Moving the client registration to ping pong between DRM and host1x has
its problems. host1x driver itself has no use for a list of client
devices. It can just iterate its children in case it needs them. In
tegradrm, you need a list of devices under tegradrm control, which might
or might not be the same as list of devices under host1x hardware.

The solutions that many other DRM drivers seem to employ are the virtual
devices. Exynos and OMAP drivers do this, as does SH Mobile DRM driver.
So I think I'd still go this way, as it's the path of minimum
resistance, least amount of code and most localized change. I know it's
not ideal, but I'd also not like to get stuck in this.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,v2,1/8] video: tegra: Add nvhost driver

2012-12-10 Thread Terje Bergström
On 29.11.2012 11:10, Mark Zhang wrote:
>> +
>> +/**
>> + * Write a cpu syncpoint increment to the hardware, without touching
>> + * the cache. Caller is responsible for host being powered.
>> + */
>> +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
>> +{
>> + struct nvhost_master *dev = syncpt_to_dev(sp);
>> + u32 reg_offset = id / 32;
>> +
>> + if (!nvhost_module_powered(dev->dev)) {
>> + dev_err(&syncpt_to_dev(sp)->dev->dev,
>> + "Trying to access host1x when it's off");
>> + return;
>> + }
>> +
>> + if (!nvhost_syncpt_client_managed(sp, id)
>> + && nvhost_syncpt_min_eq_max(sp, id)) {
>> + dev_err(&syncpt_to_dev(sp)->dev->dev,
>> + "Trying to increment syncpoint id %d beyond max\n",
>> + id);
>> + return;
>> + }
>> + writel(BIT_MASK(id), dev->sync_aperture +
>> + host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);
> 
> I have a stupid question: According to the name and the context of this
> function, seems it increases the syncpt value which specified by param
> "id". So how does this "writel" increase the value? I don't know much
> about host1x/syncpt reg operations, so could you explain a little bit or
> I just completely have a wrong understanding?

I believe I've implemented most of the requests in this mail, but I seem
to have missed answering this question.

writel() to that register invokes a method in host1x, which increments
the sync point indicated by the value of the register by one.

Best regards,
Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-05 Thread Terje Bergström
On 05.12.2012 14:04, Thierry Reding wrote:
> On Wed, Dec 05, 2012 at 01:47:38PM +0200, Terje Bergström wrote:
> Yes, but there's more. For instance I went to great lengths to make sure
> there's no global data whatsoever. So all the data is bound to the
> host1x device in the current code. I know many other drivers like to
> take a shortcut and just put these things into global variables but I
> didn't want to.

Sure, that feedback is in the todo list. For some parts I already have a
proposed solution, but for chip ops not yet.

>> The problem with doing drm_platform_init() with host1x device as
>> parameter is that drm_get_platform_dev() will take control of drvdata.
>> We'd need to put host1x specific struct host1x pointer to some other
>> place and I'm not sure what that place could be.
> 
> Not anymore. I submitted a patch so that it no longer does that. The
> patch was merged about a month and a half ago.

Oops, I must've checked the status from a stale tree. Thanks for that.
In this case there's no technical reason why host1x couldn't act as the
device to register for drm, as drm wouldn't touch host1x device data in
any way.

How about if we treat the host1x driver as utility library, and move the
code for host1x probe altogether to tegradrm/host1x.c? I haven't yet
checked how bad the dependencies between host1x's driver's host1x.c and
the rest of the driver are, so I'm not sure if it's feasible and if
there would be a clear design. Just tossing in ideas.

That would solve the circular dependency problem. I've already committed
to contents of host1x.c being very different in downstream and upstream,
so this change would just emphasize that.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-05 Thread Terje Bergström
On 05.12.2012 13:13, Thierry Reding wrote:
> What I propose is to move the client registration code that is currently
> in drivers/gpu/drm/tegra/host1x.c to the host1x driver, which may or may
> not end up in drivers/gpu/host1x.

Ok.

> 
>> host1x hardware access must be encapsulated in host1x driver
>> (drivers/gpu/host1x if that's the location we prefer). As for the
>> management of the list of DRM clients, we proposed the move to drm.c,
>> because host1x hardware would anyway be controlled by a different
>> driver. Having file called host1x.c in tegradrm didn't sound logical, as
>> its not really controlling host1x, and its probe wouldn't be called.
> 
> Oh well, at the time nobody from NVIDIA was involved so I wrote that
> code in preparation for proper host1x support that I thought I would
> have to add myself at some point. I'm more than glad that I don't have
> to do this all by myself. However the patch proposed in this series
> breaks a number of requirements such as proper encapsulation, which I
> already mentioned in more detail in another mail.

Hmm, I'm not sure if I remember that you refer to by the proper
encapsulation. Is that the fact that we bind DRM to a sub-client?

> The problem that this solves is that the DRM driver needs to be bound to
> a specific platform device. None of the DRM subdevices are suitable
> because they are only part of the whole DRM device. I think that host1x
> is the only device that fits here.
> 
> Note that this is only an administrative problem. It shouldn't interfere
> with the way host1x works. The goal is that the DRM device is registered
> at the proper hierarchical location.
> 
> The circular dependency is indeed a problem, though. Quite frankly I
> have no idea how to solve this. However the approach taken in the
> current patch will break several other requirements as I already
> explained.

The problem with doing drm_platform_init() with host1x device as
parameter is that drm_get_platform_dev() will take control of drvdata.
We'd need to put host1x specific struct host1x pointer to some other
place and I'm not sure what that place could be.

You're right in that binding to a sub-device is not a nice way. DRM
framework just needs a "struct device" to bind to. exynos seems to solve
this by introducing a virtual device and bind to that. I'm not sure if
this is the best way, but worth considering?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-05 Thread Terje Bergström
On 05.12.2012 10:33, Thierry Reding wrote:
> I've been thinking about this some more and came to the conclusion that
> since we will already have a tight coupling between host1x and tegra-drm
> we may just as well keep the client registration code in host1x. The way
> I imagine this to work would be to export a public API from tegra-drm,
> say tegra_drm_init() and tegra_drm_exit(), that could be called in place
> of drm_platform_init() and drm_platform_exit() in the current code.
> 
> tegra_drm_init() could then be passed the host1x platform device to bind
> to. The only thing that would need to be done is move the fields in the
> host1x structure specific to DRM into a separate structure. host1x would
> have to export host1x_drm_init/exit() which the DRM can invoke to have
> all DRM clients register to the DRM subsystem.
> 
> From a hierarchical point of view this makes sense, with host1x being
> the parent of all DRM subdevices. It allows us to reuse the current code
> from tegra-drm that has been tested and works properly even for module
> unload/reload. We also get to keep the proper encapsulation and the
> switch to the separate host1x driver will require a much smaller patch.
> 
> Does anybody see a disadvantage in this approach?

I'm a bit confused about the scope. You mention host1x several times,
but I'm not sure if you mean the file drivers/gpu/drm/tegra/host1x.c or
the host1x driver. So I might be babbling when I answer this. Could you
please clarify that?

host1x hardware access must be encapsulated in host1x driver
(drivers/gpu/host1x if that's the location we prefer). As for the
management of the list of DRM clients, we proposed the move to drm.c,
because host1x hardware would anyway be controlled by a different
driver. Having file called host1x.c in tegradrm didn't sound logical, as
its not really controlling host1x, and its probe wouldn't be called.

If your proposal is that we'd move the management of the list of host1x
devices from tegradrm to host1x driver, we'd have a tight circular
dependency between two drivers and that's almost always a bad idea. So
far all ideas have revolved around tegradrm calling host1x, and host1x
calling a bit of DRM (for CMA, would be fixed in later version) but not
host1x calling tegradrm.

host1x driver itself has only little use for the list of clients.
Basically we need only a list of channels, and platform devices
associated with channels, to be able to dump host1x channel state.

Mind you, I believe nvhost driver as part of our BSP has had quite many
more hours of runtime than tegradrm. :-)

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-03 Thread Terje Bergström
On 03.12.2012 23:03, Thierry Reding wrote:
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.

I agree that the chip ops access goes through too much indirection and
macro magic (which I already declared I hate), so we're going to design
something simpler.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-02 Thread Terje Bergström
On 02.12.2012 22:55, Thierry Reding wrote:
> FWIW I'm convinced that you're genuinely trying to make this work and
> nobody welcomes this more than me. However it is only natural if you
> dump such a large body of code on the community that people will
> disagree with some of the design decisions.
> 
> So when I comment on the design or patches in general, it is not my
> intention to exclude you or NVIDIA in any way. All I'm trying to do is
> spot problematic or unclear parts that will make working with the code
> any more difficult than it has to be.

Thanks, I know it'a a large dump and you've made great comments about
the code, and hit most of the sore spots I've had with the driver, too.
I appreciate your effort - this process is making the driver better.

It's good to hear that our goals are aligned. So now that we got that
out of the system, let's get back to business. :-)

> Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can
> also live with the host1x driver staying in drivers/video, but I don't
> think it's the proper location and drivers/gpu/host1x seems like a much
> better fit.

That sounds like a plan to me.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-02 Thread Terje Bergström
On 01.12.2012 23:42, Dave Airlie wrote:
> Guys I think you guys might be overthniking things here.
> 
> I know you have some sort of upstream/downstream split, but really in
> the upstream kernel, we don't care about that, so don't make it our
> problem.

I am not trying to make anything your problem. Most of the issues we
have already worked out with a good solution that all active
participants have agreed with. We have only a couple of disagreements
with Thierry.

My goal is to get a good open source co-operation and trying to prevent
a code fork while still maintaining good design. That way everybody
wins. The way to do that is to base our BSP on upstream kernel.

I'm not trying to here throw code over the fence and flee. This is a
genuine attempt to work together. I want to prevent the "we" (kernel
community excluding NVIDIA) and "you" (NVIDIA) that a split code base
would cause in the long run. I'd like to just talk about "we" including
NVIDIA.

> There is no need for any sort of stable API between host1x and the sub
> drivers, we change APIs in the kernel the whole time it isn't a
> problem.
> 
> If you need to change the API, submit a single patch changing it
> across all the drivers in the tree, collecting Acks or not as needed.
> We do this the whole time, I've never had or seen a problem with it.
> 
> We don't do separate subsystems APIs set in stone bullshit, and all
> subsystem maintainers are used to dealing with these sort of issues.
> You get an ack from one maintainer and the other one sticks it in his
> tree with a note to Linus.
> 
> You can put the code where you want, maybe just under drivers/gpu
> instead of drivers/video or drivers/gpu/drm, just make sure you have a
> path for it into the kernel.

Follows exactly my thinking, as the location of host1x driver has no
practical consequence to me.

Thierry proposed drivers/gpu/host1x. I'd like to see a couple of
comments on that proposal, and if it sticks, follow that.

Thierry, did you mean that host1x driver would be in drivers/gpu/host1x,
and tegradrm in drivers/gpu/drm/tegra, or would we put both in same
directory?

> And I have an non-upstream precedent for v4l sitting on drm, some
> radeon GPUs have capture tuners, and the only way to implement that
> would be to stick a v4l driver in the radeon drm driver. Not a
> problem, just never finished writing the code.

Yes, I just mentioned that as awkward, but I have no problem with any path.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-01 Thread Terje Bergström
On 01.12.2012 19:34, Lucas Stach wrote:
> This would certainly make life easier, but personally I don't think it's
> the right thing to do. The separation of the Linux kernel into different
> subsystems was done for a reason and just because the specific hardware
> at hands happens to work a bit different is no valid reason to break
> with the standard rules of the kernel.
> 
> So I think there is no way around handling the different drivers that
> use host1x in different trees. For the time being there is _only_
> tegra-drm using host1x in the upstream kernel. We have to make sure to
> come up with some API which is reasonably stable, so we don't run into
> big problems later. That's why I'm really in favour to keep host1x and
> tegra-drm side by side in the current upstream, to make sure we can
> change the API without jumping through too much hoops.
> 
> Your downstream V4L would have to use host1x from the DRM directory, but
> really: is your downstream such a nice, clean codebase that you are not
> able to cope with the slight ugliness of this solution?

Ok, can do. I'll move the code base to drivers/gpu/drm/tegra/host1x. For
downstream, the host1x driver implements all user space APIs (no drm, no
v4l, etc) so the directory is of no consequence. If we immersed host1x
driver totally with tegra-drm, that'd be a problem, but if I can keep a
separation, that's fine.

> Please make sure to remove any unnecessary cruft from host1x in the
> process and don't try to make too big of a step at once. We only need
> one type of memory within host1x: native host1x objects, no need to plan
> for support of anything else. Also taking over ownership of the IOMMU
> address space might take some more work in the IOMMU API. We can leave
> this out completely for a start. Both Tegra 2 and 3 should be able to
> work with CMA backed objects just fine.

Ok, that simplifies the process. I'll just implement firewall and copy
the strema over to kernel space unconditionally.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-01 Thread Terje Bergström
On 01.12.2012 16:58, Thierry Reding wrote:
> I don't know where you see politics in what I said. All I'm saying is
> that we shouldn't be making things needlessly complex. In my experience
> the technically cleanest solution is usually the one with the least
> complexity.

Let me come up with a proposal and let's then see where to go next.

> But you already have extra code in the kernel to patch out expired sync-
> points. Is it really worth the added effort to burden userspace with
> this? If so I still think some kind of generic IOCTL to retrieve
> information about a syncpoint would be better than a sysfs interface.

That's exactly why I mentioned that it's not useful to upstream. There
are some cases where user space might want to check if a fence has
passed without waiting for it, but that's marginal and could be handled
even with waits with zero timeout.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 0/8] Support for Tegra 2D hardware

2012-12-01 Thread Terje Bergström
On 01.12.2012 16:45, Thierry Reding wrote:
> I did some prototyping on how a libdrm API could look like a few weeks
> back. I should clean the patches up some and push them to a public
> repository or to the mailing lists for review.

Ok. Sorry about the delay - I recently learned I need separate
permission for user space contribution, so I'm pushing to get that
permission.

> There isn't actually much more than a bit of framework along with two
> IOCTLs that allow creating and looking up a Tegra-specific GEM. The
> related kernel patches aren't available anywhere since I didn't deem
> them ready yet. At that time I wasn't even sure if we'd need special
> allocations other than what the dumb BO infrastructure provides. They
> implement some parts of what you've implemented in this series as well,
> with some slight differences.

Ok, the BO infra is still under flux as we're designing the best place
and work split.

> Currently these still use the CMA-backed GEM objects but it should be
> easy to switch to something backed by the host1x infrastructure once
> that's in good shape.

Sounds good.

> While I can't find the quote right now, I seem to remember that you said
> at some point that you were planning on adding some 2D acceleration bits
> to libdrm. I don't think that's the right place. That code should rather
> go into the DDX. libdrm should instead provide a thin layer on top of
> the DRM IOCTLs to manage buffers and submit command streams. I hope I
> can finish the cleanup of my libdrm patches over the weekend and push
> them out so this may become clearer. Maybe I can even get the
> corresponding kernel patches pushed out.

Yep, that's exactly what I actually posed as a question in one of the
earlier mails. We also agree that 2D bits should not stay in libdrm.
That's why we've kept the 2D bits design-wise separate from the host1x
stream generation.

We don't yet have any other place to put 2D functions in, so we'll
probably post them as part of patch series to libdrm. We'll just add a
disclaimer that the 2D code won't remain in libdrm, and wanted to get
the code out to review as a code example. We can put the 2D code either
to a separate library or to DDX, whichever is preferred.

The host1x command stream generation would still remain in libdrm. That
seems to be the pattern with other hardware.

Best regards,
Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-01 Thread Terje Bergström
On 01.12.2012 17:10, Thierry Reding wrote:
> On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote:
>> host1x module being in DRM directory hinders using nvhost from anywhere
>> outside DRM in both upstream and downstream.
> 
> That's not true. Nothing keeps the rest of the kernel from using an API
> exported by the tegra-drm driver.

Right, it's just a directory. I was actually thinking that it'd be weird
if a V4L2 driver would use something from inside drivers/gpu/drm/tegra
(V4L use DRM? Oh no!).

Shoot the idea down if it's crazy, but please think about it first. :-)

I started thinking about this and we are constrained by the Linux kernel
subsystems that have a complete different architecture than hardware.
This leads to awkward design as DRM design as it conflicts with the way
hardware works.

Placing host1x driver in one place, DRM driver in another and XYZ driver
in yet another is not ideal either. We're exposing a public API which
needs to be strictly maintained, because we maintain drivers in
different trees, but then again, the list of users is very static and
well-defined, so public API is an overshoot.

How about if we look at this from the hardware architecture point of
view? You mentioned that perhaps drivers/bus/host1x would be the best
place for host1x driver.

What if we put also all host1x client modules under that same directory?
drivers/bus/host1x/drm would be for DRM interface, and all other host1x
client module drivers could be placed similarly. This way we could keep
the host1x API private to host1x and the client module drivers, and it's
easy to understand how host1x is used by just following the directory
structure.

Naturally, we could also think if we want to have sub-components per
host1x client (dc, 2d, etc) and a drm sub-component that implements the
DRM interface, and a V4L2 sub-component that implements V4L2 interface
(when/if I can convince people that camera should go upstream).

>> I also don't like first putting the driver in one place, and then
>> moving it with a huge commit to another place.
> 
> Hehe, you're doing exactly that in this patch series. =)

True, I guess it's just a matter of determining what's the best time.

> Yes, there would be a certain amount of synchronization needed, but as
> Stephen correctly pointed out we could do that move through one tree
> with the Acked-by of the other maintainer. The point is that we need to
> do this once instead of everytime the API changes.

Yep, inter-tree synchronization is possible, so not a show stopper.

> An entry for drivers/gpu/drm/tegra/host1x would override an entry for
> drivers/gpu/drm/tegra so no need to exclude it. That said, there's no
> way to exclude an subdirectory in MAINTAINERS that I know of.

I saw tag X: in MAINTAINERS file, so that could be used. There's
documentation for it, but also some examples like:

IBM Power Virtual SCSI/FC Device Drivers
M:  Robert Jennings 
L:  linux-s...@vger.kernel.org
S:  Supported
F:  drivers/scsi/ibmvscsi/
X:  drivers/scsi/ibmvscsi/ibmvstgt.c

> My main point for keeping host1x within tegra-drm for now was that it
> could possibly help speed up the inclusion of the host1x code. Seeing
> that there's still a substantial amount of work to be done and a need
> for discussion I'm not sure if rushing this is the best way. In that
> case there may be justification for putting it in a separate location
> from the start.

I'm not in a hurry, so let's try to figure the best design first.
Biggest architectural unsolved problem is the memory management and
relationship between tegradrm and host1x driver. What Lucas proposed
about memory management makes sense, but it'll take a while to implement it.

The rest of the unsolved questions are more about differences in
opinion, and solvable.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-01 Thread Terje Bergström
On 01.12.2012 15:42, Daniel Vetter wrote:
> Out of sheer curiosity: What are you using the coverage data of these
> register definitions for? When I looked into coverage analysis the
> resulting data seemed rather useless to me, since the important thing
> is how well we cover the entire dynamic state space of the hw+sw (e.g.
> crap left behind by the bios ...) and coverage seemed to be a poor
> proxy for that. Hence why I wonder what you're doing with this data

Yes, it's a poor proxy. But still, I use it to determine how big
portions of hardware address space and fields I'm touching when running
host1x tests. It's interesting data for planning tests, but not much more.

Best regards,
Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-01 Thread Terje Bergström
On 30.11.2012 10:50, Lucas Stach wrote:
> I'm with Thierry here. I think there is a fair chance that we won't get
> the API right from the start, even when trying to come up with something
> that sounds sane to everyone. It's also not desirable to delay gr2d
> going into mainline until we are all completely satisfied with the API.
> 
> I also fail to see how host1x module being in the DRM directory hinders
> any downstream development. So I'm in favour of keeping host1x besides
> the other DRM components to lower the burden for API changes and move it
> out into some more generic directory, once we feel confident that the
> API is reasonable stable.

host1x module being in DRM directory hinders using nvhost from anywhere
outside DRM in both upstream and downstream. I also don't like first
putting the driver in one place, and then moving it with a huge commit
to another place. We'd just postpone exactly the problems that were
indicated earlier: we'd need to synchronize two trees to remove code in
one and add in another at the same time so that there wouldn't be
conflicting host1x drivers. I'd rather just add it in final place once,
and be done with it.

But if it's a make-it-or-brake-it for upstreaming, I can move it to be a
subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd
modify the MAINTAINER's file so that the tegradrm entry excludes host1x
sub-directory, and I'd add another one which included only the host1x
sub-directory? The host1x part would be Supported, whereas rest of
tegradrm is Maintained.

Best regards,
Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-12-01 Thread Terje Bergström
On 30.11.2012 12:38, Thierry Reding wrote:
> * PGP Signed by an unknown key
> The above implies that when you submit code it shouldn't contain pieces
> that prepare for possible future extensions which may or may not be
> submitted (the exception being if such changes are part of a series
> where subsequent patches actually use them). The outcome is that the
> amount of cruft in the mainline kernel is kept to a minimum. And that's
> a very good thing.

We're now talking about actually a separation of logical and physical
driver. I can't see why that's a bad thing. Especially considering that
it's standard practice in well written drivers. Let's try to find a
technical clean solution instead of debating politics. The latter should
never be part of Linux kernel reviews.

>> I could do that for upstream. In downstream it cannot depend on DEBUG
>> flag, as these spews are an important part of how we debug problems with
>> customer devices and the DEBUG flag is never on in customer builds.
> 
> So I've just looked through these patches once more and I can't find
> where this functionality is actually used. The host1x_syncpt_debug()
> function is assigned to the nvhost_syncpt_ops.debug member, which in
> turn is only used by nvhost_syncpt_debug(). The latter, however is
> never used (not even by the debug infrastructure introduced in patch
> 4).

I have accidentally used the syncpt_op().debug() version directly. I'll
fix that.

> Okay, so what you're sayingCan here is that a huge number of people haven't
> been wise in using the preprocessor for register definitions all these
> years. That's a pretty bold statement. Now I obviously haven't looked at
> every single line in the kernel, but I have never come across this usage
> for static inline functions used for this. So, to be honest, I don't
> think this is really up for discussion. Of course if you come up with an
> example where this is done in a similar way I could be persuaded
> otherwise.

We must've talked about a bit different things. For pure register defs,
I can accommodate changing to #defines. We'd lose the code coverage
analysis, though, but if the parentheses are a make-or-break question to
upstreaming, I can change.

I was thinking of definitions like this:

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
{
return (v & 0x1ff) << 0;
}

versus

#define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff

Both of these produce the same machine code and have same usage, but the
latter has type checking and code coverage analysis and the former is
(in my eyes) clearer. In both of these cases the usage is like this:

writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
| host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
m->sync_aperture + host1x_sync_cfpeek_ctrl_r());

> But I don't see how that's relevant here. Let me quote what you said
> originally:
> 
>> This is actually the only interface to read the max value to user space,
>> which can be useful for doing some comparisons that take wrapping into
>> account. But we could just add IOCTLs and remove the sysfs entries.
> 
> To me that sounded like it was only used for debugging purposes. If you
> actually need to access this from a userspace driver then, as opposed to
> what I said earlier, this should be handled by some IOCTL.

There's a use for production code to know both the max and min, but I
think we can just scope that use out from this patch sest.

User space can use these two for checking if one of their fences has
already passed by comparing if the current value is between min and
fence, taking wrapping into account. In these cases user space can f.ex.
leave a host1x wait out from a command stream.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-11-30 Thread Terje Bergström
On 29.11.2012 13:47, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
>> Tegra20 and Tegra30 are compatible, but future chips are not. I was
>> hoping we would be ready in upstream kernel for future chips.
> 
> I think we should ignore that problem for now. Generally planning for
> any possible combination of incompatibilities leads to overgeneralized
> designs that require precisely these kinds of indirections.
> 
> Once some documentation for Tegra 40 materializes we can start thinking
> about how to encapsulate the incompatible code.

I think here our perspectives differ a lot. That is natural considering
the company I work for and company you work for, so let's try to sync
the perspective.

In my reality, whatever is in market is old news and I barely work on
them anymore. Upstreaming activity is the exception. 90% of my time is
spent dealing with future chips which I know cannot be handled without
this split to logical and physical driver parts.

For you, Tegra2 and Tegra3 are the reality.

If we move nvhost in upstream a bit incompatible, that's fine, like
ripping out features or adding new new stuff, like a new memory type.
All of this I can support with a good diff tool to get all the patches
flowing between upstream and downstream.

If we do fundamental changes that prevent bringing the code back to
downstream, like removing this abstraction, the whole process of
upstream and downstream converging hits a brick wall. We wouldn't have
proper continuing co-operation, but just pushing code out and being done
with it.

> I noticed that it was filled with content in one of the subsequent
> patches. Depending on how this gets merged eventually you could postpone
> adding the function until the later patch. But perhaps once the code has
> been properly reviewed we can just squash the patches again. We'll see.

Ok, thanks.

>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
> 
> I think it might still be useful for diagnostics. It seems to be used
> when writes time out. That could still be helpful information when
> debugging problems.

It's actually a stale comment. The client units are not signaling
anything useful with the interrupt. There's use for it in downstream,
but that's irrelevant here.

> Making this generic for all modules may not be what we want as it
> doesn't allow devices to handle things themselves if necessary. Clock
> management is just part of the boiler plate that every driver is
> supposed to cope with. Also the number of clocks is usually not higher
> than 2 or 3, so the pain is manageable. =)
> 
> Furthermore doing this in loops may not work for all modules. Some may
> require additional delays between enabling the clocks, others may be
> able to selectively disable one clock but not the other(s).

Yes, but I'll just rip the power management code out, so we can postpone
this until we have validated and verified the runtime PM mechanism
downstream.

>> I could move this to debug.c, but it's debugging aid when a command
>> stream is misbehaving and it spews this to UART when sync point wait is
>> timing out. So not debugfs stuff.
> 
> Okay, in that case it should stay in. Perhaps convert dev_info() to
> dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
> guards would also be useful. Maybe not.

I could do that for upstream. In downstream it cannot depend on DEBUG
flag, as these spews are an important part of how we debug problems with
customer devices and the DEBUG flag is never on in customer builds.

> The problem is not with autogenerated files in general. The means by
> which they are generated are less important. However, autogenerated
> files often contain a lot of unneeded definitions and contain things
> such as "autogenerated - do not edit" lines.
> 
> So generally if you generate the content using some scripts to make sure
> it corresponds to what engineering gave you, that's okay as long as you
> make sure it has the correct form and doesn't contain any cruft.

I can remove the boilerplate, that's not a problem. In general, we have
tried to be very selective about what we generate, so that it matches
what we're using.

>> I like static inline because I get the benefit of compiler type
>> checking, and gcov shows me which register definitions have been used in
>> different tests.
> 
> Type checking shouldn't be necessary for simple defines. And I wasn't
> aware that you could get the Linux kernel to write out data to be fed to
> gcov.
> 
>> #defines are always messy and I pretty much hate them. But if the
>> general request is to use #define's, eve

Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-29 Thread Terje Bergström
On 29.11.2012 14:14, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 10:09:13AM +0100, Lucas Stach wrote:
>> This way you would also be able to construct different handles (like GEM
>> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
>> not sure how useful this would be, but it seems like a reasonable design
>> to me being able to do so.
> 
> Wouldn't that be useful for sharing buffers between DRM and V4L2 using
> dma-buf? I'm not very familiar with how exactly importing and exporting
> work with dma-buf, so maybe I need to read up some more.

I would still preserve the dma-buf support, for exactly this purpose.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
Just replying to part of your mail.

On 30.11.2012 09:22, Thierry Reding wrote:
> Actually for the display controller we want just a notification when the
> VBLANK happens. I'm not sure if we want to do that with syncpoints at
> all since it works quite well using regular interrupts.

VBLANK isn't actually a very good example of dc's use of sync points.
That can easily be done with regular interrupts, as you mention.

More important is when we have double buffering enabled. When you draw
something to a surface, and flip it to display, you want DC to notify
when the flip has been done and rendering can continue to the back buffer.

So, what you can do is return a fence from DC when initiating a flip,
and place that fence into 2D stream as a host wait so that 2D will
patiently wait for buffer to become free before it renders.

> What I'm proposing is to leave it up to each host1x client how they want
> to handle this. For display controllers it may be enough to have their
> callback run in interrupt context but other clients may need to do more
> work so they can queue it themselves.

DC doesn't need to worry about host1x interrupts at all. It's all
internal to the host1x driver, so we're now just talking about the
internal implementation of host1x.

We have two scenarios for the syncpt interrupts. One is that a job got
finished and we need to clean up the queue and free up resources. This
must be done in threads. Other is releasing a thread that is blocked by
a syncpt wait.

It's simpler if both of these are handled with the same infrastructure,
and we've shown that latency is very good even if we handle all events
in a thread.

> I know that this looks like it might be more work, but if it turns out
> that many drivers need to do the exact same thing, that functionality
> can be factored out into a helper. But it may just as well turn out that
> the requirements for each module are slightly different that forcing a
> workqueue on them could result in ugly workarounds because it doesn't
> quite work for them.

This is just driver internal, so there's no need for other drivers to
access this part.

> If we move responsibility of managing the workqueue out of host1x as I
> proposed above, maybe a lot of this code can be removed. Maybe you can
> explain a bit what they are used for exactly in your write-up.

It's going to be a big bad boy. :-)

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-11-29 Thread Terje Bergström
On 29.11.2012 20:34, Stephen Warren wrote:
> On 11/29/2012 03:21 AM, Terje Bergström wrote:
>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
> 
> Do make sure the interrupts still are part of the DT binding though, so
> that the binding fully describes the HW, and the interrupt is available
> to retrieve if we ever do use it in the future.

Sure, I will just not use the generic irq in DT, but it won't require
any changes in DT bindings.

> You can still create tables of clocks inside the driver and loop over
> them. So, loop unrolling isn't related to my comments at least. It's
> just that clk_get() shouldn't take its parameters from platform data.
> 
> But if these are clocks for (arbitrary) child modules (that may or may
> not exist dynamically), why aren't the drivers for the child modules
> managing them?

There are actually two things here that I mixed, and because of that I
probably confused everybody else.

Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
the clocks. That's why list of clocks and their frequency policies have
been part of the device description in nvhost. ACM is being replaced
with runtime PM in downstream kernel, but it still requires rigorous
testing and analysis of power profile before we can move to it.

Then, the second thing is that nvhost_probe() has had its own loop to go
through the clocks of host1x module. It's copy-paste of what ACM did,
which is just bad design. That's easily replaceable with static code, as
nvhost_probe() is just for host1x. I'll do that, and as I rip out the
generic power management code, I'll also make 2D and host1x drivers
enable the clocks at probe with static code.

So I think we have a solution that resonates with all proposals.

Best regards,
Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-29 Thread Terje Bergström
On 29.11.2012 11:09, Lucas Stach wrote:
> We should aim for a clean split here. GEM handles are something which is
> really specific to how DRM works and as such should be constructed by
> tegradrm. nvhost should really just manage allocations/virtual address
> space and provide something that is able to back all the GEM handle
> operations.
> 
> nvhost has really no reason at all to even know about GEM handles. If
> you back a GEM object by a nvhost object you can just peel out the
> nvhost handles from the GEM wrapper in the tegradrm submit ioctl handler
> and queue the job to nvhost using it's native handles.
> 
> This way you would also be able to construct different handles (like GEM
> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
> not sure how useful this would be, but it seems like a reasonable design
> to me being able to do so.

Ok, I must say that I got totally surprised by this and almost fell off
the bench of the bus while commuting to home and reading this mail. On
the technical side, what you wrote makes perfect sense and we'll go
through this idea very carefully, so don't take me wrong.

What surprised me was that we had always assumed that nvmap, the
allocator we use in downstream kernel, would never be something that
would be accepted upstream, so we haven't done work at all on cleaning
it up and refactoring it for upstreaming, and cutting ties between
nvhost and nvmap. We assumed that we need to provide something that fit
into tegradrm and interacts with dma_buf and GEM, so we've written
something small that fulfills this need.

Now what you're suggesting is akin to getting a subset of nvmap into
picture. In downstream kernel it already takes care of all memory
management problems we've discussed wrt IOMMU (duplicate management,
different memory architectures, etc). But, it has a lot more than what
we need for now, so we'd need to decide if we go for importing parts of
nvmap as nvhost allocator, or use the allocator in the patchset I sent
earlier as basis.

>> Yep, this would definitely simplify our IOMMU problem. But, I thought
>> the canonical way of dealing with device memory is DMA API, and you're
>> saying that we should just bypass it and call IOMMU directly?
>>
> This is true for all standard devices. But we should not consider this
> as something set in stone and then building some crufty design around
> it. If we can manage to make our design a lot cleaner by managing DMA
> memory and the corresponding IOMMU address spaces for the host1x devices
> ourselves, I think this is the way to go. All other graphics drivers in
> the Linux kernel have to deal with their GTT in some way, we just happen
> to do so by using a shared system IOMMU and not something that is
> exclusive to the graphics devices.
> 
> This is more work on the side of nvhost, but IMHO the benefits make it
> look worthwhile.
> What we should avoid is something that completely escapes the standard
> ways of dealing with memory used in the Linux kernel, like using
> carveout areas, but I think this is already consensus among us all.

Makes perfect sense. I'll need to hash out a proposal on how to go about
this.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 3/8] video: tegra: host: Add channel and client support

2012-11-29 Thread Terje Bergström
On 29.11.2012 12:04, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Mon, Nov 26, 2012 at 03:19:09PM +0200, Terje Bergstrom wrote:
> 
> I've skipped a lot of code here that I need more time to review.

Thanks already for the very good comments! It's great getting comments
on the code from fresh eyes.

> Looking some more at how this is used, I'm starting to think that it
> might be easier to export the various handlers and allow them to be
> passed to the nvhost_intr_add_action() explicitly.

Oh, so you mean like "nvhost_intr_add_action(intr, id, threshold,
nvhost_intr_action_submit_complete, channel, waiter, priv), and
nvhost_intr_action_submit_complete is the function pointer?

There's one case to take care of: we merge the waits for the jobs into
one waiter to save us from having too many irq calls. Perhaps that could
be handled by a flag, or something like that.

>> +/* Magic to use to fill freed handle slots */
>> +#define BAD_MAGIC 0xdeadbeef
> 
> This isn't currently used.

Will remove.

> 
>> +static size_t job_size(u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
>> +{
>> + u32 num_unpins = num_cmdbufs + num_relocs;
>> + s64 total;
>> +
>> + if (num_relocs < 0 || num_waitchks < 0 || num_cmdbufs < 0)
>> + return 0;
>> +
>> + total = sizeof(struct nvhost_job)
>> + + num_relocs * sizeof(struct nvhost_reloc)
>> + + num_unpins * sizeof(struct nvhost_job_unpin_data)
>> + + num_waitchks * sizeof(struct nvhost_waitchk)
>> + + num_cmdbufs * sizeof(struct nvhost_job_gather);
>> +
>> + if (total > ULONG_MAX)
>> + return 0;
>> + return (size_t)total;
>> +}
>> +
>> +
>> +static void init_fields(struct nvhost_job *job,
>> + u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
>> +{
>> + u32 num_unpins = num_cmdbufs + num_relocs;
>> + void *mem = job;
>> +
>> + /* First init state to zero */
>> +
>> + /*
>> +  * Redistribute memory to the structs.
>> +  * Overflows and negative conditions have
>> +  * already been checked in job_alloc().
>> +  */
>> + mem += sizeof(struct nvhost_job);
>> + job->relocarray = num_relocs ? mem : NULL;
>> + mem += num_relocs * sizeof(struct nvhost_reloc);
>> + job->unpins = num_unpins ? mem : NULL;
>> + mem += num_unpins * sizeof(struct nvhost_job_unpin_data);
>> + job->waitchk = num_waitchks ? mem : NULL;
>> + mem += num_waitchks * sizeof(struct nvhost_waitchk);
>> + job->gathers = num_cmdbufs ? mem : NULL;
>> + mem += num_cmdbufs * sizeof(struct nvhost_job_gather);
>> + job->addr_phys = (num_cmdbufs || num_relocs) ? mem : NULL;
>> +
>> + job->reloc_addr_phys = job->addr_phys;
>> + job->gather_addr_phys = &job->addr_phys[num_relocs];
>> +}
> 
> I wouldn't bother splitting out the above two functions.

You're right, I'll merge them back in. There was a historical reason for
the split, but not anymore.

> 
>> +
>> +struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
>> + int num_cmdbufs, int num_relocs, int num_waitchks)
>> +{
>> + struct nvhost_job *job = NULL;
>> + size_t size = job_size(num_cmdbufs, num_relocs, num_waitchks);
>> +
>> + if (!size)
>> + return NULL;
>> + job = vzalloc(size);
> 
> Why vzalloc()?

I guess it's basically moot, but we tried that when we had some memory
fragmentation issues and it was left even though we did find out it's
not needed.

>> +void nvhost_job_add_gather(struct nvhost_job *job,
>> + u32 mem_id, u32 words, u32 offset)
>> +{
>> + struct nvhost_job_gather *cur_gather =
>> + &job->gathers[job->num_gathers];
>> +
>> + cur_gather->words = words;
>> + cur_gather->mem_id = mem_id;
>> + cur_gather->offset = offset;
>> + job->num_gathers += 1;
> 
> job->num_gathers++

OK.

>> +static int pin_job_mem(struct nvhost_job *job)
>> +{
>> + int i;
>> + int count = 0;
>> + int result;
>> + long unsigned *ids =
>> + kmalloc(sizeof(u32 *) *
>> + (job->num_relocs + job->num_gathers),
>> + GFP_KERNEL);
> 
> Maybe this should be allocated along with the nvhost_job and the other
> fields to avoid having to allocate, and potentially fail, here?

Yes, you're right.

>> + __raw_writel(
>> + (job->reloc_addr_phys[i] +
>> + reloc->target_offset) >> reloc->shift,
>> + (cmdbuf_page_addr +
>> + (reloc->cmdbuf_offset & ~PAGE_MASK)));
> 
> You're not writing to I/O memory, so you shouldn't be using
> __raw_writel() here.

Will change.

> 
>> +int nvhost_job_pin(struct nvhost_job *job, struct platform_device *pdev)
>> +{
>> + int err = 0, i = 0, j = 0;
>> + struct nvhost_syncpt *sp = &nvhost_get_host(pdev)->syncpt;
>> + unsigned long wait

Re: [RFC,v2,3/8] video: tegra: host: Add channel and client support

2012-11-29 Thread Terje Bergström
On 29.11.2012 12:01, Mark Zhang wrote:
>> +fail:
>> +/* Add clean-up */
> 
> Yes, add "nvhost_module_deinit" here?

Sounds good.

>> +int nvhost_client_device_suspend(struct platform_device *dev)
>> +{
>> +int ret = 0;
>> +struct nvhost_device_data *pdata = platform_get_drvdata(dev);
>> +
>> +ret = nvhost_channel_suspend(pdata->channel);
>> +dev_info(&dev->dev, "suspend status: %d\n", ret);
>> +if (ret)
>> +return ret;
>> +
>> +return ret;
> 
> Minor issue: just "return ret" is OK. That "if" doesn't make sense.

Yes, must be some snafu when doing changes in code.

>> -struct nvhost_chip_support *nvhost_chip_ops;
>> +static struct nvhost_chip_support *nvhost_chip_ops;
>>  
> 
> All right, already fixed here. Sorry, so just ignore what I said about
> this in my reply to your patch 1.

I was wondering about this, because I thought I did make it static. But
it looks like I added that to the wrong commit. Anyway, this needs
rethinking.

>> +struct mem_handle *nvhost_dmabuf_get(u32 id, struct platform_device *dev)
>> +{
>> +struct mem_handle *h;
>> +struct dma_buf *buf;
>> +
>> +buf = dma_buf_get(to_dmabuf_fd(id));
>> +if (IS_ERR_OR_NULL(buf))
>> +return (struct mem_handle *)buf;
>> +
>> +h = (struct mem_handle *)dma_buf_attach(buf, &dev->dev);
>> +if (IS_ERR_OR_NULL(h))
>> +dma_buf_put(buf);
> 
> Return an error here.

Will do.

>> +op->nvhost_dev.alloc_nvhost_channel = t20_alloc_nvhost_channel;
>> +op->nvhost_dev.free_nvhost_channel = t20_free_nvhost_channel;
>> +
> 
> I recall in previous version, there is t30-related alloc_nvhost_channel
> & free_nvhost_channel. Why remove them?

I could actually refactor these all into one alloc channel. We already
store the number of channels in a data type, so a generic channel
allocator would be better than having a chip specific one.

>> +static int push_buffer_init(struct push_buffer *pb)
>> +{
>> +struct nvhost_cdma *cdma = pb_to_cdma(pb);
>> +struct nvhost_master *master = cdma_to_dev(cdma);
>> +pb->mapped = NULL;
>> +pb->phys = 0;
>> +pb->handle = NULL;
>> +
>> +cdma_pb_op().reset(pb);
>> +
>> +/* allocate and map pushbuffer memory */
>> +pb->mapped = dma_alloc_writecombine(&master->dev->dev,
>> +PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
>> +if (IS_ERR_OR_NULL(pb->mapped)) {
>> +pb->mapped = NULL;
>> +goto fail;
> 
> Return directly here. "goto fail" makes "push_buffer_destroy" get called.

Will do.

> 
>> +}
>> +
>> +/* memory for storing mem client and handles for each opcode pair */
>> +pb->handle = kzalloc(NVHOST_GATHER_QUEUE_SIZE *
>> +sizeof(struct mem_handle *),
>> +GFP_KERNEL);
>> +if (!pb->handle)
>> +goto fail;
>> +
>> +/* put the restart at the end of pushbuffer memory */
> 
> Just for curious, why "pb->mapped + 1K" is the end of a 4K pushbuffer?

pb->mapped is u32 *, so compiler will take care of multiplying by
sizeof(u32).

>> +unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
>> +enum cdma_event event)
>> +{
>> +for (;;) {
>> +unsigned int space = cdma_status_locked(cdma, event);
>> +if (space)
>> +return space;
>> +
>> +/* If somebody has managed to already start waiting, yield */
>> +if (cdma->event != CDMA_EVENT_NONE) {
>> +mutex_unlock(&cdma->lock);
>> +schedule();
>> +mutex_lock(&cdma->lock);
>> +continue;
>> +}
>> +cdma->event = event;
>> +
>> +mutex_unlock(&cdma->lock);
>> +down(&cdma->sem);
>> +mutex_lock(&cdma->lock);
> 
> I'm newbie of nvhost but I feel here is very tricky, about the lock and
> unlock of this mutex: cdma->lock. Does it require this mutex is locked
> before calling this function? And do we need to unlock it before the
> code: "return space;" above? IMHO, this is not a good design and can we
> find out a better solution?

Yeah, it's not perfect and good solutions are welcome.
cdma_status_locked() must be called with a mutex. But, what we generally
wait for is for space in push buffer. The cleanup code cannot run if we
keep cdma->lock, so I release it.

The two ways to loop are because there was a race between two processes
waiting for space. One of them set cdma->event to indicate what it's
waiting for and can go to sleep, but the other has to keep spinning.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
On 29.11.2012 10:44, Thierry Reding wrote:
>> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
>> index 98c9c9f..025a820 100644
>> --- a/drivers/video/tegra/host/dev.c
>> +++ b/drivers/video/tegra/host/dev.c
>> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
>>  }
>>  EXPORT_SYMBOL(host1x_syncpt_read);
>>
>> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
> 
> The choice of data types is odd here. id refers to a syncpt so a better
> choice would have been unsigned int because the size of the variable
> doesn't actually matter. But as I already said in my reply to patch 1,
> these are resources and should therefore better be abstracted through an
> opaque pointer anyway.
> 
> timeout is usually signed long, so this function should reflect that. As
> for the value this is probably fine as it will effectively be set from a
> register value. Though you also cache them in software using atomics.

32-bits is an architectural limit for the sync point id, so that's why I
used it here. But you're right - it doesn't really matter and could be
changed to unsigned long.

thresh and *value reflects that sync point value is 32-bit, and I'd keep
that as is.

Timeout should be unsigned long, yes.

>> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
>> b/drivers/video/tegra/host/host1x/host1x_intr.c
> [...]
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "nvhost_intr.h"
>> +#include "host1x/host1x.h"
>> +
>> +/* Spacing between sync registers */
>> +#define REGISTER_STRIDE 4
> 
> Erm... no. The usual way you should be doing this is either make the
> register definitions account for the stride or use accessors that apply
> the stride. You should be doing the latter anyway to make accesses. For
> example:
> 
> static inline void host1x_syncpt_writel(struct host1x *host1x,
> unsigned long value,
> unsigned long offset)
> {
> writel(value, host1x->regs + SYNCPT_BASE + offset);
> }
> 
> static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
> unsigned long offset)
> {
> return readl(host1x->regs + SYNCPT_BASE + offset);
> }
> 
> Alternatively, if you want to pass the register index instead of the
> offset, you can use just multiply the offset in that function:
>
> writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
>
> The same can also be done with the non-syncpt registers.

The register number has a stride of 4 when doing writes, and 1 when
adding to command streams. This is why I've kept the register
definitions as is.

I could add helper functions. Just as a side note, the sync register
space has other definitions than just the syncpt registers, so the
naming should be changed a bit.

>> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
>> +{
>> + struct nvhost_master *dev = dev_id;
>> + void __iomem *sync_regs = dev->sync_aperture;
>> + struct nvhost_intr *intr = &dev->intr;
>> + unsigned long reg;
>> + int i, id;
>> +
>> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
>> + reg = readl(sync_regs +
>> + host1x_sync_syncpt_thresh_cpu0_int_status_r() +
>> + i * REGISTER_STRIDE);
>> + for_each_set_bit(id, ®, BITS_PER_LONG) {
>> + struct nvhost_intr_syncpt *sp =
>> + intr->syncpt + (i * BITS_PER_LONG + id);
>> + host1x_intr_syncpt_thresh_isr(sp);
>> + queue_work(intr->wq, &sp->work);
>> + }
>> + }
>> +
>> + return IRQ_HANDLED;
>> +}
> 
> Maybe it would be better to call the syncpt handlers in interrupt
> context and let them schedule work if they want to. I'm thinking about
> the display controllers which may want to use syncpoints for VBLANK
> support.

Display controller can use the APIs to read, increment and wait for sync
point.

We could do more in isr, but then again, we've noticed that the current
design already gives pretty good latency, so we haven't seen the need to
move code from thread to isr.

>> +static void host1x_intr_init_host_sync(struct nvhost_intr *intr)
>> +{
>> + struct nvhost_master *dev = intr_to_dev(intr);
>> + void __iomem *sync_regs = dev->sync_aperture;
>> + int i, err, irq;
>> +
>> + writel(0xUL,
>> + sync_regs + host1x_sync_syncpt_thresh_int_disable_r());
>> + writel(0xUL,
>> + sync_regs + host1x_sync_syncpt_thresh_cpu0_int_status_r());
>> +
>> + for (i = 0; i < dev->info.nb_pts; i++)
>> + INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
>> +
>> + irq = platform_get_irq(dev->dev, 0);
>> + WARN_ON(IS_ERR_VALUE(irq));
>> +   

Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-11-29 Thread Terje Bergström
On 28.11.2012 23:23, Thierry Reding wrote:
> This could be problematic. Since drivers/video and drivers/gpu/drm are
> separate trees, this would entail a continuous burden on keeping both
> trees synchronized. While I realize that eventually it might be better
> to put the host1x driver in a separate place to accomodate for its use
> by other subsystems, I'm not sure moving it here right away is the best
> approach.

I understand your point, but I hope also that we'd end up with something
that can be used as basis for the downstream kernel to migrate to
upstream stack.

The key point here is to make the API between nvhost and tegradrm as
small and robust to changes as possible.

> I'm not sure drivers/video is the best location either. Perhaps
> drivers/bus would be better? Or maybe we need a new subdirectory for
> this kind of device.

This is a question I don't have an answer to. I'm perfectly ok moving it
wherever the public opinion leads it to.

> I think the general nowadays is to no longer use filenames in comments.

Ok, I hadn't noticed that. I'll remove them. It's redundant information
anyway.

>> +struct nvhost_chip_support *nvhost_chip_ops;
>> +
>> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
>> +{
>> + return nvhost_chip_ops;
>> +}
> 
> This seems like it should be more tightly coupled to the host1x device.
> And it shouldn't be a global variable.

Yeah, I will figure out a better way to handle the chip ops. I'm not too
happy with it. Give me a bit of time to come up with a good solution.

>> +struct output;
> 
> What's this? It doesn't seem to be used anywhere.

It's just a mistake. The struct is used in debug code, but not referred
to in this file so the forward declaration is not needed.

>> +struct nvhost_master;
> 
> Why do you suffix this with _master? The whole point of host1x is to be
> the "master" so you can just as well call it nvhost, right? Ideally
> you'd call it host1x, but I'm repeating myself. =)

Yes, the name is just a historic relict and I'm blind to them as I've
been staring at the code for so long. I think "host1x" would be a good
name for the struct.

>> +struct nvhost_syncpt_ops {
>> + void (*reset)(struct nvhost_syncpt *, u32 id);
>> + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
>> + void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
>> + u32 (*update_min)(struct nvhost_syncpt *, u32 id);
>> + void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
>> + void (*debug)(struct nvhost_syncpt *);
>> + const char * (*name)(struct nvhost_syncpt *, u32 id);
>> +};
> 
> Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
> be compatible when it comes to handling syncpoints. I thought they would
> even be compatible in all other aspects as well, so why even have this?

Tegra20 and Tegra30 are compatible, but future chips are not. I was
hoping we would be ready in upstream kernel for future chips.

>> +#define syncpt_op()  (nvhost_get_chip_ops()->syncpt)
> 
> You really shouldn't be doing this, but rather use explicit accesses for
> these structures. If you're design doesn't scatter these definitions
> across several files then it isn't difficult to obtain the correct
> pointers and you don't need these "shortcuts".

Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt
or nvhost_intr structs?

> This API looks odd. Should syncpoints not be considered as regular
> resources, much like interrupts? In that case it would be easier to
> abstract them away behind an opaque type. It looks like you already use
> the struct nvhost_syncpt to refer to the set of syncpoints associated
> with a host1x device.
> 
> How about you use nvhost/host1x_syncpt to refer to individual syncpoints
> instead. You could export an array of those from your host1x device and
> implement a basic resource allocation mechanism on top, similar to how
> other resources are handled in the kernel.
> 
> So a host1x client device could call host1x_request_syncpt() to allocate
> a syncpoint from it's host1x parent dynamically along with passing a
> name and a syncpoint handler to it.

That might work. I'll think about that - thanks.

>> +bool host1x_powered(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_powered);
>> +
>> +void host1x_busy(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_busy);
>> +
>> +void host1x_idle(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_idle);
> 
> These look like a reimplementation of the runtime power-management
> framework.

Yes, we at some point tried to move to use runtime PM. The first attempt
was thwarted by runtime PM and system suspend conflicting with each
other. I believe this is pretty much fixed in later versions of kernel.

Also, the problem was that runtime PM doesn't support multiple power
states. In downstream kernel, we support clock gating and power gating.
When we moved to runtime PM and impleme

Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-29 Thread Terje Bergström
On 28.11.2012 20:46, Lucas Stach wrote:
> Am Mittwoch, den 28.11.2012, 18:23 +0200 schrieb Terje Bergström:
>> Sorry. I promised in another thread a write-up explaining the design. I
>> still owe you guys that.
> That would be really nice to have. I'm also particularly interested in
> how you plan to do synchronization of command streams to different
> engines working together, if that's not too much to ask for now. Like
> userspace uploading a texture in a buffer, 2D engine doing mipmap
> generation, 3D engine using mipmapped texture.

I can briefly explain (and then copy-paste to a coherent text once I get
to it) how inter-engine synchronization is done. It's not specifically
for 2D or 3D, but generic to any host1x client.

Sync point register is a counter that can only increase. It starts from
0 and is incremented by a host1x client or CPU. host1x can freeze a
channel until a sync point value is reached, and it can trigger an
interrupt upon reaching a threshold. On Tegra2 and Tegra3 we have 32
sync points.

host1x clients all implement a method for incrementing a sync point
based on a condition, and on all of them (well, not entirely true) the
register is number 0. The most used condition is op_done, telling the
client to increment sync point once the previous operations are done.

In kernel, we keep track of the active range of sync point values, i.e.
ones we expect to be reached with the active jobs. Active range's
minimum is the current value read from hw and shadowed in memory. At job
submit time, kernel increments the maximum by the number of sync points
the stream announces it will perform. After performing the increment, we
have a number, which the sync point is supposed to reach at the end of
submit. That number is the the fence and it is recorded in kernel and
returned to user space.

So, when user space flushes, it receives a fence. It can insert the
fence into another command stream as parameter to a host1x channel wait.
This makes that channel freeze until an operation in another channel is
finished. That's how different host1x clients can synchronize without
using CPU.

Kernel's sync point wait essentially puts the process into sleep until
host1x sends an interrupt and we determine the value that a process is
waiting for, has been reached.

On top of this, we guard against wrapping issues by nulling out any sync
point waits (CPU or inside stream) that are waiting for values outside
the active range, and we have a timeout for jobs so that we can kick out
misbehaving command streams.

> Ah yes I see. So if we consider nvhost to be the central entity in
> charge of controlling all host1x clients and tegradrm as the interface
> that happens to bundle display, 2d and 3d engine functionality into it's
> interface we should probably aim for two things:
> 1. Move everything needed by all engines down into nvhost (I especially
> see the allocator falling under this point, I'll explain why this would
> be beneficial a bit later)

Ok. This is almost the current design, except for the allocator.

> 2. Move the exposed DRM interface more in line with other DRM drivers.
> Please take a look at how for example the GEM_EXECBUF ioctl works on
> other drivers to get a feeling of what I'm talking about. Everything
> using the display, 2D and maybe later on the 3D engine should only deal
> with GEM handles. I really don't like the idea of having a single
> userspace application, which uses engines with similar and known
> requirements (DDX) dealing with dma-buf handles or other similar high
> overhead stuff to do the most basic tasks.
> If we move down the allocator into nvhost we can use buffers allocated
> from this to back GEM or V4L2 buffers transparently. The ioctl to
> allocate a GEM buffer shouldn't do much more than wrapping the nvhost
> buffer.

Ok, this is actually what we do downstream. We use dma-buf handles only
for purposes where they're really needed (in fact, none yet), and use
our downstream allocator handles for the rest. I did this, because
benchmarks were showing that memory management overhead shoot through
the roof if I tried doing everything via dma-buf.

We can move support for allocating GEM handles to nvhost, and GEM
handles can be treated just as another memory handle type in nvhost.
tegradrm would then call nvhost for allocation.


> This may also solve your problem with having multiple mappings of the
> same buffer into the very same address space, as nvhost is the single
> instance that manages all host1x client address spaces. If the buffer is
> originating from there you can easily check if it's already mapped. For
> Tegra 3 to do things in an efficient way we likely have to move away
> from dealing with the DMA API to dealing with the IOMMU API, this gets a
> _lot_ easier_ if you have a single point wher

Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-28 Thread Terje Bergström
On 28.11.2012 17:13, Lucas Stach wrote:
> To be honest I still don't grok all of this, but nonetheless I try my
> best.

Sorry. I promised in another thread a write-up explaining the design. I
still owe you guys that.

> Anyway, shouldn't nvhost be something like an allocator used by host1x
> clients? With the added ability to do relocs/binding of buffers into
> client address spaces, refcounting buffers and import/export dma-bufs?
> In this case nvhost objects would just be used to back DRM GEM objects.
> If using GEM objects in the DRM driver introduces any cross dependencies
> with nvhost, you should take a step back and ask yourself if the current
> design is the right way to go.

tegradrm has the GEM allocator, and tegradrm contains the 2D kernel
interface. tegradrm contains a dma-buf exporter for the tegradrm GEM
objects.

nvhost accepts jobs from tegradrm's 2D driver. nvhost increments
refcounts and maps the command stream and target memories to devices,
maps the command streams to kernel memory, replaces the placeholders in
command streams with addresses with device virtual addresses, and unmaps
the buffer from kernel memory. nvhost uses dma buf APIs for all of the
memory operations, and relies on dmabuf for refcounting. After all this
the command streams are pushed to host1x push buffer as GATHER (kind of
a "gosub") opcodes, which reference to the command streams.

Once the job is done, nvhost decrements refcounts and updates pushbuffer
pointers.

The design is done so that nvhost won't be DRM specific. I want to
enable creating V4L2 etc interfaces that talk to other host1x clients.
V4L2 (yeah, I know nothing of V4L2) could pass frames via nvhost to EPP
for pixel format conversion or 2D for rotation and write result to frame
buffer.

Do you think there's some fundamental problem with this design?

>> Taking a step back - 2D streams are actually very short, in the order of
>> <100 bytes. Just copying them to kernel space would actually be faster
>> than doing MMU operations.
>>
> Is this always the case because of the limited abilities of the gr2d
> engine, or is it just your current driver flushing the stream very
> often?

It's because of limited abilities of the hardware. It just doesn't take
that many operations to invoke 2D.

The libdrm user space we're created flushes probably a bit too often
now, but even in downstream the streams are not much longer.  It takes
still at least a week to get the user space code out for you to look at.

> In which way is it a good design choice to let the CPU happily alter
> _any_ buffer the GPU is busy processing without getting the concurrency
> right?

Concurrency is handled with sync points. User space will know when a
command stream is processed and can be reused by comparing the current
sync point value, and the fence that 2D driver returned to user space.
User space can have a pool of buffers and can recycle when it knows it
can do so. But, this is not enforced by kernel.

The difference with your proposal and what I posted is the level of
control user space has over its command stream management. But as said,
2D streams are so short that my guess is that there's not too much
penalty copying it to kernel managed host1x push buffer directly instead
of inserting a GATHER reference.

> Please keep in mind that the interfaces you are now trying to introduce
> have to be supported for virtually unlimited time. You might not be able
> to scrub your mistakes later on without going through a lot of hassles.
> 
> To avoid a lot of those mistakes it might be a good idea to look at how
> other drivers use the DRM infrastructure and only part from those proven
> schemes where really necessary/worthwhile.

Yep, as the owner of this driver downstream, I'm also leveraging my
experience with the graphics stack in our downstream software stack that
is accessible via f.ex. L4T.

This is exactly the discussion we should be having, and I'm learning all
the time, so let's continue tossing around ideas until we're both happy
with the result.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-28 Thread Terje Bergström
On 28.11.2012 16:06, Lucas Stach wrote:
> Why do even need/use dma-buf for this use case? This is all one DRM
> device, even if we separate host1x and gr2d as implementation modules.

I didn't want to implement dependency to drm gem objects in nvhost, but
we have thought about doing that. dma-buf brings quite a lot of
overhead, so implementing support for gem buffers would make the
sequence a bit leaner.

nvhost already has infra to support multiple memory managers.

> So standard way of doing this is:
> 1. create gem object for pushbuffer
> 2. create fake mmap offset for gem obj
> 3. map pushbuf using the fake offset on the drm device
> 4. at submit time zap the mapping
> 
> You need this logic anyway, as normally we don't rely on userspace to
> sync gpu and cpu, but use the kernel to handle the concurrency issues.

Taking a step back - 2D streams are actually very short, in the order of
<100 bytes. Just copying them to kernel space would actually be faster
than doing MMU operations.

I think for Tegra20 and non-IOMMU case, we just need to copy the command
stream to kernel buffer. In Tegra30 IOMMU case reference to user space
buffers are fine, as tampering the streams doesn't have any ill effects.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-28 Thread Terje Bergström
On 28.11.2012 15:33, Lucas Stach wrote:
> So this is obviously wrong. Userspace has to allocate a pushbuffer from
> the kernel just as every other buffer, then map it into it's own address
> space to push in commands. At submit time of the pushbuf kernel has to
> make sure that userspace is not able to access the memory any more, i.e.
> kernel shoots down the vma or pagetable of the vma. To keep overhead low
> and not do any blocking you can just keep some pushbufs around for one
> channel and switch over the pagetable entries to the next free buffer,
> just make sure that userspace is never able to tamper with a buffer as
> long as the gpu isn't done with it.

That's really not something dma-buf APIs are equipped to handle. We need
something to ensure user space doesn't have the buffer mapped (either
return error if has, or zap the mapping), something to ensure user space
cannot mmap the buffer, and something to revert this all once we're done.

We could add these as special ops to tegradrm dmabuf code for now, and
assume that command streams are always allocated by tegradrm. Now we
allow any dmabuf to be used as buffers for command streams.

And, with IOMMU I don't think we would need any of this. I guess we need
to press the gas pedal on figuring out how to enable that for tegradrm
on Tegra30.

We already allocate multiple buffers to be able to fill in the next
buffer once we've send one to kernel, so that part is ok. We reuse only
once we know that the operations contained are done.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-28 Thread Terje Bergström
On 28.11.2012 01:00, Dave Airlie wrote:
> We  generally aim for the first, to stop the gpu from reading/writing
> any memory it hasn't been granted access to,
> the second is nice to have though, but really requires a GPU with VM
> to implement properly.

I wonder if we should aim at root only access on Tegra20, and force
IOMMU on Tegra30 and fix the remaining issues we have with IOMMU. The
firewall turns out to be more complicated than I wished.

Biggest problem is that we aim at zero-copy for everything possible,
including command streams. Kernel gets a handle to a command stream, but
the command stream is allocated by the user space process. So the user
space can tamper with the stream once it's been written to the host1x 2D
channel.

Copying with firewall is one option, but that would again kill the
performance. One option would be user space unmapping the command buffer
when it's sent to kernel, and kernel checking that it's unmapped before
it agrees to send the stream to hardware.

On Tegra30 with IOMMU turned on things are ok without any checks,
because all access would go via MMU, which makes kernel memory inaccessible.

Of course, better ideas are welcome.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-27 Thread Terje Bergström
On 27.11.2012 13:47, Lucas Stach wrote:
> I guess we could change IOMMU address spaces for the graphics units
> depending on the active channel. This would still be a bit of a
> performance hit, because of the necessary TLB flushing and so on, but
> should be much better than checking the whole command stream. This way
> you at least get security on a process level, as no process is able to
> corrupt another processes graphics resources.

One physical channel is shared with all users of the 2D unit. Each job
is just added to the queue, and host1x will happily cross from one job
to the next without intervention from CPU. This is done to keep CPU
overhead down to improve power and performance.

This also means that we cannot change the IOMMU settings between jobs
from different processes, unless we pause the channel after every job.

This is still an interesting thought - can we postpone binding of a
buffer to address space until submit time, and give each process its own
address space? We would have a limit of "submits from three processes
going on at once" instead of "three processes can open 2D channel at
once". That's a limitation we could live with.

Naturally, Tegra2 is still left in the cold.

> This is the same level of security as provided by the nouveau driver.
> But to do so all memory management has to be done in kernel and from the
> current submissions of the 2D infrastructure I fear that the current
> architecture does too much of that in userspace, but I'll hold back with
> any judgement until we actually get to see the userspace parts.

User space allocates buffer, exports as dmabuf fd, and passes the fd in
submits to kernel, and frees the buffer. No other memory management is
done in user space.

> Also to implement this strategy you have to take ownership of the
> graphics address space on a much lower level than the DMA API. This
> might take some work together with the IOMMU guys.

I'll go through this with Hiroshi, who knows that area.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-27 Thread Terje Bergström
On 27.11.2012 12:37, Thierry Reding wrote:
> But in that case it should be made mandatory at first until proper IOMMU
> support is enabled on Tegra30. Then it can be checked at driver probe
> time whether or not to enable the extra checks. That way we don't need a
> special Kconfig option and we still get all the security that we need,
> right?

I guess it depends on the level of security.

If we want to only protect kernel and user space memory, this would be
sufficient and no firewall is needed if IOMMU is turned on.

If we want to protect 2D buffers from each other, this is not sufficient.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-27 Thread Terje Bergström
On 27.11.2012 10:32, Dave Airlie wrote:
> On Tue, Nov 27, 2012 at 8:16 AM, Terje Bergström  
> wrote:
>> Thanks for the pointer, I looked at exynos code. It indeed checks the
>> registers written to, but it doesn't prevent overrun by checking sizes
>> of buffers and compare against requests.
> They probably need to add that, its not as important as the base
> addresses, unless it takes negative strides, generally base addresses
> means you can target current->uid quite easily!

Ok. We'll implement the firewall, unless we come up with even a better
choice.

>> If this is the way to go, I'll put the firewall behind a Kconfig flag so
>> that system integrator can decide if his system needs it.
> We don't generally make security like this optional :-)
> 
> If you do that you should restrict the drm device to root users only,
> and never let a user with a browser anywhere near it.

My thinking was that the system integrator can decide how much he trusts
the binaries (incl. browser plugins) in the system. If he trusts the
binaries, the firewall can be turned off.

> Like I know what you guys get away with in closed source world, but
> here we don't write root holes into our driver deliberately.

Duly noted. :-)

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-27 Thread Terje Bergström
On 27.11.2012 09:33, Dave Airlie wrote:
>> Third would be having a firewall in 2D driver checking the stream and
>> ensuring all registers that accept addresses are written by values
>> derived from dmabufs. I haven't tried implementing this, but it'd
>> involve a lookup table in kernel and CPU reading through the command
>> stream. Offsets and sizes would also need to be validated. There would
>> be a performance hit.
> 
> This is the standard mechanism, and what exynos does as well.
> 
> The per process VM method is also used as an extension to this on some hw.

Hi,

Thanks for the pointer, I looked at exynos code. It indeed checks the
registers written to, but it doesn't prevent overrun by checking sizes
of buffers and compare against requests.

Based on my experience with Tegra graphics stack, going through command
streams in kernel is bad for performance. For 2D operations this is
probably ok as the command streams are pretty simple. Anything more
complex is going to cause severe degradation of performance, but it's
also outside the scope of this patch set.

If this is the way to go, I'll put the firewall behind a Kconfig flag so
that system integrator can decide if his system needs it.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-26 Thread Terje Bergström
On 27.11.2012 00:15, Dave Airlie wrote:
>>  static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>  {
>> -   return 0;
>> +   struct tegra_drm_fpriv *fpriv;
>> +   int err = 0;
>> +
>> +   fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>> +   if (!fpriv)
>> +   return -ENOMEM;
>> +
>> +   INIT_LIST_HEAD(&fpriv->contexts);
>> +   filp->driver_priv = fpriv;
>> +
> 
> who frees this?

Probably nobody. Will fix.

>> +struct tegra_drm_syncpt_incr_args {
>> +   __u32 id;
>> +};
> 
> add 32-bits of padding here
> 
>> +
>> +struct tegra_drm_syncpt_wait_args {
>> +   __u32 id;
>> +   __u32 thresh;
>> +   __s32 timeout;
>> +   __u32 value;
>> +};
>> +
>> +#define DRM_TEGRA_NO_TIMEOUT   (-1)
>> +
>> +struct tegra_drm_open_channel_args {
>> +   __u32 class;
>> +   void *context;
> 
> no pointers use u64, align them to 64-bits, so 32-bits of padding,

I'll add the paddings and change pointers to u64's to all of the structs
in this file.

> i'll look at the rest of the patches, but I need to know what commands
> can be submitted via this interface and what are the security
> implications of it.

All of the commands are memory operations (copy, clear, rotate, etc)
involving either one or two memory regions that are defined via dmabuf
fd's and offsets. The commands can also contain plain device virtual
addresses and 2D would be happy to oblige as long as the memory is
mapped to it.

There are a few ways to help the situation. None of them are perfect.

On Tegra30 we could allocate an address space per process. This would
mean that max 3 processes would be able to use the 2D unit at one time,
assuming that other devices are find using the one remaining address
space. On Tegra20 this is not an option.

Second would be using device permissions - only allow selected processes
to access 2D.

Third would be having a firewall in 2D driver checking the stream and
ensuring all registers that accept addresses are written by values
derived from dmabufs. I haven't tried implementing this, but it'd
involve a lookup table in kernel and CPU reading through the command
stream. Offsets and sizes would also need to be validated. There would
be a performance hit.

Fourth would be to move the creation of streams to kernel space. That'd
mean moving the whole 2D driver and host1x command stream handling to
kernel space and quite a lot of time spent in kernel. I'm not too keen
on this for obvious reasons.

Other ideas are obviously welcome.

Thanks!

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 5/8] ARM: tegra: Add auxiliary data for nvhost

2012-11-26 Thread Terje Bergström
On 27.11.2012 01:39, Stephen Warren wrote:
> Clock names shouldn't be passed in platform data; instead, clk_get()
> should be passed the device object and device-relative (i.e. not global)
> clock name. I expect if the driver is fixed to make this change, the
> changes to tegra*_clocks_data.c won't be needed either.

Isn't this code doing exactly that - getting a device relative clock,
nvhost_module_init() in nvhost.acm.c:

(...)
/* initialize clocks to known state */
while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) {
long rate = pdata->clocks[i].default_rate;
struct clk *c;

c = devm_clk_get(&dev->dev, pdata->clocks[i].name);
if (IS_ERR_OR_NULL(c)) {
dev_err(&dev->dev, "Cannot get clock %s\n",
pdata->clocks[i].name);
return -ENODEV;
}

rate = clk_round_rate(c, rate);
clk_prepare_enable(c);
clk_set_rate(c, rate);
clk_disable_unprepare(c);
pdata->clk[i] = c;
i++;
}
(...)

Without the clock changes, the clocks in board files are now assigned to
devid "tegra_grhost". I guess the correct way to do this would be to
assign them to "tegra-gr2d" (2d, epp) and "host1x" - except if we also
want to drop "tegra-" from the device name.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/2] NVIDIA Tegra DRM driver

2012-11-15 Thread Terje Bergström
On 15.11.2012 23:28, Thierry Reding wrote:
> This third version of the patch series cleans up some minor issues that
> were found in the previous versions, such as deferred probe not working
> properly and the display remaining enabled after the driver is removed.
> 
> I've also put the two patches in this series into the tegra/drm/for-3.8
> branch of my repository on gitorious[0].

Excellent work. Thanks a lot Thierry!

I tested this and the device tree patches on my Tegra20 board. For both:

Acked-by: Terje Bergstrom 
Tested-by: Terje Bergstrom 

Best regards,
Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >