[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-03 Thread Mark Zhang
On 01/03/2013 01:50 PM, Terje Bergstr?m wrote:
> On 03.01.2013 05:31, Mark Zhang wrote:
>> Sorry I didn't get it. Yes, in current design, you can pin all mem
>> handles in one time but I haven't found anything related with "locking
>> only once per submit".
>>
>> My idea is:
>> - remove "job->addr_phys"
>> - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately
>> - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to
>> fill the "reloc_addr_phy" & "gather_addr_phys".
>>
>> Anything I misunderstood?
> 
> The current design uses CMA, which makes pinning basically a no-op. When
> we have IOMMU support, pinning requires calling into IOMMU. Updating
> SMMU tables requires locking, and probably maintenance before SMMU code
> also requires its own locked data tables. For example, preventing
> duplicate pinning might require a global table of handles.
> 
> Putting all of the handles in one table allows doing duplicate detection
> across cmdbuf and reloc tables. This allows pinning each buffer exactly
> once, which reduces number of calls to IOMMU.
> 

Thanks Terje. Yes, I understand IOMMU will bring in more operations. But
I'm not convinced that separating 2 arrays have lots of overheads than
putting them into one array.

But it doesn't matter, after the IOMMU support version released, I can
read this part of codes again. Let's talk about this at that time.

>> "host1x_cma_pin_array_ids" doesn't return negative value right now, so
>> maybe you need to take a look at it.
> 
> True, and also a consequence of using CMA: pinning can never fail. With
> IOMMU, pinning can fail.

Got it. Agree.

> 
> Terje
> 


[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-03 Thread Mark Zhang
On 01/02/2013 05:42 PM, Terje Bergstr?m wrote:
> On 28.12.2012 11:14, Mark Zhang wrote:
>> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
>> index a936902..c3ded60 100644
>> --- a/drivers/gpu/drm/tegra/gr2d.c
>> +++ b/drivers/gpu/drm/tegra/gr2d.c
>> @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
>> *context,
>> if (err)
>> goto fail;
>>
>> +   /* We define CMA as an temporary solution in host1x driver.
>> +  That's also why we have a CMA kernel config in it.
>> +  But seems here, in tegradrm, we hardcode the CMA here.
>> +  So what do you do when host1x change to IOMMU?
>> +  Do we also need to define a CMA config in tegradrm
>> driver,
>> +  then after host1x changes to IOMMU, we add another IOMMU
>> +  config in tegradrm? Or we should invent another more
>> +  generic wrapper layer here? */
>> cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
>> cmdbuf.mem);
> 
> Lucas is working on host1x allocator, so let's defer this comment until
> we have Lucas' code.
> 

OK.

>> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
>> index cc8ca2f..0867b72 100644
>> --- a/drivers/gpu/host1x/job.c
>> +++ b/drivers/gpu/host1x/job.c
>> @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
>> host1x_channel *ch,
>> mem += num_unpins * sizeof(dma_addr_t);
>> job->pin_ids = num_unpins ? mem : NULL;
>>
>> +   /* I think this is a somewhat ugly design.
>> +  Actually "addr_phys" is consisted by "reloc_addr_phys" and
>> +  "gather_addr_phys".
>> +  Why don't we just declare "reloc_addr_phys" and
>> "gather_addr_phys"?
>> +  In current design, let's say if one nvhost newbie changes the
>> order
>> +  of these 2 blocks of code in function "pin_job_mem":
>> +
>> +  for (i = 0; i < job->num_relocs; i++) {
>> +   struct host1x_reloc *reloc = >relocarray[i];
>> +   job->pin_ids[count] = reloc->target;
>> +   count++;
>> +  }
>> +
>> +  for (i = 0; i < job->num_gathers; i++) {
>> +   struct host1x_job_gather *g = >gathers[i];
>> +   job->pin_ids[count] = g->mem_id;
>> +   count++;
>> +  }
>> +
>> +  Then likely something weird occurs... */
> 
> We do this because this way we can implement batch pinning of memory
> handles. This way we can decrease memory handle management a lot as we
> need to do locking only once per submit.
> 

Sorry I didn't get it. Yes, in current design, you can pin all mem
handles in one time but I haven't found anything related with "locking
only once per submit".

My idea is:
- remove "job->addr_phys"
- assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately
- In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to
fill the "reloc_addr_phy" & "gather_addr_phys".

Anything I misunderstood?

> Decreasing memory management overhead is really important, because in
> complex graphics cases, there are might be a hundreds of buffer
> references per submit, and several submits per frame. Any extra overhead
> relates directly to reduced performance.
> 
>> job->reloc_addr_phys = job->addr_phys;
>> job->gather_addr_phys = >addr_phys[num_relocs];
>>
>> @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
>> }
>> }
>>
>> +   /* I have question here. Does this mean the address info
>> +  which need to be relocated(according to the libdrm codes,
>> +  seems this address is "0xDEADBEEF") always staying at the
>> +  beginning of a page? */
>> __raw_writel(
>> (job->reloc_addr_phys[i] +
>> reloc->target_offset) >> reloc->shift,
> 
> No - the slot can be anywhere. That's why we have cmdbuf_offset in the
> reloc struct.
> 

Yes. Sorry I must misread the code before.

>> @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
>> platform_device *pdev)
>> }
>> }
>>
>> +   /* if (host1x_firewall && !err) { */
>> if (host1x_firewall) {
>> err = copy_gathers(job, pdev);
>> if (err) {
> 
> Will add.
> 
>> @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
>> platform_device *pdev)
>> }
>> }
>>
>> +/* Rename this label to "out" or something else.
>> +   Because if everything goes right, the codes under this label also
>> +   get executed. */
>>  fail:
>> wmb();
> 
> Will do.
> 
>>
>> diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
>> index f3954f7..bb5763d 100644
>> --- a/drivers/gpu/host1x/memmgr.c
>> +++ b/drivers/gpu/host1x/memmgr.c
>> @@ -156,6 

[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-03 Thread Mark Zhang
On 01/02/2013 05:25 PM, Terje Bergstr?m wrote:
> On 26.12.2012 11:42, Mark Zhang wrote:
[...]
> 
>>
>> if (!de)
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>> index 07e8813..01ed10d 100644
>> --- a/drivers/gpu/host1x/dev.c
>> +++ b/drivers/gpu/host1x/dev.c
>> @@ -38,6 +38,7 @@
>>
>>  struct host1x *host1x;
>>
>> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>>  void host1x_syncpt_incr_byid(u32 id)
> 
> No, host1x_syncpt_incr_byid is SMP safe.

Correct. Lock is unnecessary.

> 
>>  {
>> struct host1x_syncpt *sp = host1x->syncpt + id;
>> @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
>>  }
>>  EXPORT_SYMBOL(host1x_syncpt_read_byid);
>>
>> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>>  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)
> 
> Same here, SMP safe.
> 
>>  {
>> struct host1x_syncpt *sp = host1x->syncpt + id;
>> @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)
>>
>> err = host1x_alloc_resources(host);
>> if (err) {
>> +   /* We don't have chip_ops right now, so here the
>> +  error message is somewhat improper */
>> dev_err(>dev, "failed to init chip support\n");
>> goto fail;
>> }
> 
> Actually, alloc_resources only allocates intr->syncpt, so I the code to
> host1x_intr_init().
> 
>> @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
>> if (!host->syncpt)
>> goto fail;
>>
>> +   /* I suggest create a dedicate function for initializing nop sp.
>> +  First this "_host1x_syncpt_alloc" looks like an internal/static
>> +  function.
>> +  Then actually "host1x_syncpt_alloc" & "_host1x_syncpt_alloc" all
>> +  serve host1x client(e.g: gr2d) so it's not suitable to use them
>> +  for nop sp.
>> +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
>> +  This will make the code more readable. */
>> host->nop_sp = _host1x_syncpt_alloc(host, NULL, 0);
> 
> _host1x_syncpt_alloc is an internal function, not exported.
> host1x_syncpt_alloc is exported. I think it's even better if I just move
> allocation of nop_sp to happen in host1x_syncpt_init.
> 

Agree.

>> if (!host->nop_sp)
>> goto fail;
[...]
> 
>> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
>> index efcb9be..e112001 100644
>> --- a/drivers/gpu/host1x/intr.c
>> +++ b/drivers/gpu/host1x/intr.c
>> @@ -329,8 +329,13 @@ void host1x_intr_deinit(struct host1x_intr *intr)
>>  void host1x_intr_start(struct host1x_intr *intr, u32 hz)
>>  {
>> struct host1x *host1x = intr_to_host1x(intr);
>> +
>> +   /* Why we need to lock here? Seems like this function is
>> +  called by host1x's probe only. */
>> mutex_lock(>mutex);
>>
>> +   /* "init_host_sync" has already been called in function
>> +  host1x_intr_init. Remove this line. */
>> host1x->intr_op.init_host_sync(intr);
>> host1x->intr_op.set_host_clocks_per_usec(intr,
>> DIV_ROUND_UP(hz, 100));
> 
> In future, we'll call host1x_intr_start() whenever host1x is turned on.
> Thus we need locking.
>
> init_host_sync() should actually be called from host1x_intr_start(), not
> _init().
> 

Got it. Thanks for explanation.

>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
>> index 07cbca5..9a234ad 100644
>> --- a/drivers/gpu/host1x/syncpt.c
>> +++ b/drivers/gpu/host1x/syncpt.c
>> @@ -309,6 +309,8 @@ struct host1x_syncpt *host1x_syncpt_init(struct
>> host1x *host)
>> struct host1x_syncpt *syncpt, *sp;
>> int i;
>>
>> +   /* Consider devm_kzalloc here. Then you can forget the release
>> +  stuffs about this "syncpt". */
>> syncpt = sp = kzalloc(sizeof(struct host1x_syncpt) * 
>> host->info.nb_pts,
>> GFP_KERNEL);
>> if (!syncpt)
> 
> Will do.
> 
> Thanks!
> 
> Terje
> 


[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-03 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
> Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
>> Some of the issues left open:
>>  * Register definitions still use static inline. There has been a
>>debate about code style versus ability to use compiler type
>>checking and code coverage analysis. There was no conclusion, so
>>I left it as was.
> This has to be resolved before merging. Personally I'm in favour of
> keeping reg access patterns close to what is done in other parts of the
> kernel.

How about if I define inline functions, and #defines to proxy to them?
Something like:

static inline u32 host1x_sync_cfpeek_ctrl_r(void)
{
return 0x74c;
}
#define HOST1X_SYNC_CFPEEK_CTRL \
host1x_sync_cfpeek_ctrl_r()

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
{
return (v & 0x1) << 31;
}
#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \
host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v)

I'd use the macros in .c files. That way the references will look
familiar to reader of .c files, but we can still get the benefits of
compiler processing (type checking, better error codes etc) and gcov
coverage tracking.

Terje


[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-03 Thread Terje Bergström
On 03.01.2013 05:31, Mark Zhang wrote:
> Sorry I didn't get it. Yes, in current design, you can pin all mem
> handles in one time but I haven't found anything related with "locking
> only once per submit".
> 
> My idea is:
> - remove "job->addr_phys"
> - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately
> - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to
> fill the "reloc_addr_phy" & "gather_addr_phys".
> 
> Anything I misunderstood?

The current design uses CMA, which makes pinning basically a no-op. When
we have IOMMU support, pinning requires calling into IOMMU. Updating
SMMU tables requires locking, and probably maintenance before SMMU code
also requires its own locked data tables. For example, preventing
duplicate pinning might require a global table of handles.

Putting all of the handles in one table allows doing duplicate detection
across cmdbuf and reloc tables. This allows pinning each buffer exactly
once, which reduces number of calls to IOMMU.

> "host1x_cma_pin_array_ids" doesn't return negative value right now, so
> maybe you need to take a look at it.

True, and also a consequence of using CMA: pinning can never fail. With
IOMMU, pinning can fail.

Terje


[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 28.12.2012 11:14, Mark Zhang wrote:
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index a936902..c3ded60 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
> *context,
> if (err)
> goto fail;
> 
> +   /* We define CMA as an temporary solution in host1x driver.
> +  That's also why we have a CMA kernel config in it.
> +  But seems here, in tegradrm, we hardcode the CMA here.
> +  So what do you do when host1x change to IOMMU?
> +  Do we also need to define a CMA config in tegradrm
> driver,
> +  then after host1x changes to IOMMU, we add another IOMMU
> +  config in tegradrm? Or we should invent another more
> +  generic wrapper layer here? */
> cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
> cmdbuf.mem);

Lucas is working on host1x allocator, so let's defer this comment until
we have Lucas' code.

> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index cc8ca2f..0867b72 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
> host1x_channel *ch,
> mem += num_unpins * sizeof(dma_addr_t);
> job->pin_ids = num_unpins ? mem : NULL;
> 
> +   /* I think this is a somewhat ugly design.
> +  Actually "addr_phys" is consisted by "reloc_addr_phys" and
> +  "gather_addr_phys".
> +  Why don't we just declare "reloc_addr_phys" and
> "gather_addr_phys"?
> +  In current design, let's say if one nvhost newbie changes the
> order
> +  of these 2 blocks of code in function "pin_job_mem":
> +
> +  for (i = 0; i < job->num_relocs; i++) {
> +   struct host1x_reloc *reloc = >relocarray[i];
> +   job->pin_ids[count] = reloc->target;
> +   count++;
> +  }
> +
> +  for (i = 0; i < job->num_gathers; i++) {
> +   struct host1x_job_gather *g = >gathers[i];
> +   job->pin_ids[count] = g->mem_id;
> +   count++;
> +  }
> +
> +  Then likely something weird occurs... */

We do this because this way we can implement batch pinning of memory
handles. This way we can decrease memory handle management a lot as we
need to do locking only once per submit.

Decreasing memory management overhead is really important, because in
complex graphics cases, there are might be a hundreds of buffer
references per submit, and several submits per frame. Any extra overhead
relates directly to reduced performance.

> job->reloc_addr_phys = job->addr_phys;
> job->gather_addr_phys = >addr_phys[num_relocs];
> 
> @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
> }
> }
> 
> +   /* I have question here. Does this mean the address info
> +  which need to be relocated(according to the libdrm codes,
> +  seems this address is "0xDEADBEEF") always staying at the
> +  beginning of a page? */
> __raw_writel(
> (job->reloc_addr_phys[i] +
> reloc->target_offset) >> reloc->shift,

No - the slot can be anywhere. That's why we have cmdbuf_offset in the
reloc struct.

> @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
> platform_device *pdev)
> }
> }
> 
> +   /* if (host1x_firewall && !err) { */
> if (host1x_firewall) {
> err = copy_gathers(job, pdev);
> if (err) {

Will add.

> @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
> platform_device *pdev)
> }
> }
> 
> +/* Rename this label to "out" or something else.
> +   Because if everything goes right, the codes under this label also
> +   get executed. */
>  fail:
> wmb();

Will do.

> 
> diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
> index f3954f7..bb5763d 100644
> --- a/drivers/gpu/host1x/memmgr.c
> +++ b/drivers/gpu/host1x/memmgr.c
> @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
> platform_device *dev,
> count, _data[pin_count],
> phys_addr);
> 
> +   /* I don't think the current "host1x_cma_pin_array_ids"
> +  is able to return a negative value. So this "if" doesn't
> +  make sense...*/
> if (cma_count < 0) {
> /* clean up previous handles */
> while (pin_count) {

It should return negative in case of error.

Terje


[PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 26.12.2012 11:42, Mark Zhang wrote:
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index a936902..28954b3 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
>  {
> int err;
> struct gr2d *gr2d = NULL;
> +   /* Cause drm_device is created in host1x driver. So
> +  if host1x drivers loads after tegradrm, then in this
> +  gr2d_probe function, this "drm_device" will be NULL.
> +  How can we handle this? Defer driver probe? */

I will push a new proposal about how the devices & drivers get probed.

> struct platform_device *drm_device =
> host1x_drm_device(to_platform_device(dev->dev.parent));
> struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
> @@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
> 
> gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
> if (!gr2d->syncpt) {
> +   /* Do we really need this?
> +  After "host1x_channel_alloc", the refcount of this
> +  channel is 0. So calling host1x_channel_put here will
> +  make the refcount going to negative.
> +  I suppose we should call "host1x_free_channel" here. */

True. I need to also export host1x_channel_free (I will change the name,
too).

> host1x_channel_put(gr2d->channel);
> return -ENOMEM;
> }
> @@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
> 
> err = tegra_drm_register_client(tegradrm, >client);
> if (err)
> +   /* Add "host1x_free_channel" */

Will do.

> return err;
> 
> platform_set_drvdata(dev, gr2d);
> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
> index 3705cae..ed83b9e 100644
> --- a/drivers/gpu/host1x/channel.c
> +++ b/drivers/gpu/host1x/channel.c
> @@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
> platform_device *pdev)
> int max_channels = host1x->info.nb_channels;
> int err;
> 
> +   /* Is it necessary to add mutex protection here?
> +  I'm just wondering in a smp system, multiple host1x clients
> +  may try to alloc their channels simultaneously... */

I'll add locking.

> if (chindex > max_channels)
> return NULL;
> 
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> index 86d5c70..f2bd5aa 100644
> --- a/drivers/gpu/host1x/debug.c
> +++ b/drivers/gpu/host1x/debug.c
> @@ -164,6 +164,10 @@ static const struct file_operations
> host1x_debug_fops = {
> 
>  void host1x_debug_init(struct host1x *host1x)
>  {
> +   /* I think it's better to set this directory name the same with
> +  the driver's name -- defined in dev.c:
> +  #define DRIVER_NAME  "tegra-host1x"
> +  */
> struct dentry *de = debugfs_create_dir("tegra_host", NULL);

Will do.

> 
> if (!de)
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index 07e8813..01ed10d 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -38,6 +38,7 @@
> 
>  struct host1x *host1x;
> 
> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>  void host1x_syncpt_incr_byid(u32 id)

No, host1x_syncpt_incr_byid is SMP safe.

>  {
> struct host1x_syncpt *sp = host1x->syncpt + id;
> @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
>  }
>  EXPORT_SYMBOL(host1x_syncpt_read_byid);
> 
> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)

Same here, SMP safe.

>  {
> struct host1x_syncpt *sp = host1x->syncpt + id;
> @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)
> 
> err = host1x_alloc_resources(host);
> if (err) {
> +   /* We don't have chip_ops right now, so here the
> +  error message is somewhat improper */
> dev_err(>dev, "failed to init chip support\n");
> goto fail;
> }

Actually, alloc_resources only allocates intr->syncpt, so I the code to
host1x_intr_init().

> @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
> if (!host->syncpt)
> goto fail;
> 
> +   /* I suggest create a dedicate function for initializing nop sp.
> +  First this "_host1x_syncpt_alloc" looks like an internal/static
> +  function.
> +  Then actually "host1x_syncpt_alloc" & "_host1x_syncpt_alloc" all
> +  serve host1x client(e.g: gr2d) so it's not suitable to use them
> +  for nop sp.
> +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
> +  This will make the code more 

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 26.12.2012 11:42, Mark Zhang wrote:
 diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
 index a936902..28954b3 100644
 --- a/drivers/gpu/drm/tegra/gr2d.c
 +++ b/drivers/gpu/drm/tegra/gr2d.c
 @@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
 platform_device *dev)
  {
 int err;
 struct gr2d *gr2d = NULL;
 +   /* Cause drm_device is created in host1x driver. So
 +  if host1x drivers loads after tegradrm, then in this
 +  gr2d_probe function, this drm_device will be NULL.
 +  How can we handle this? Defer driver probe? */

I will push a new proposal about how the devices  drivers get probed.

 struct platform_device *drm_device =
 host1x_drm_device(to_platform_device(dev-dev.parent));
 struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
 @@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
 platform_device *dev)
 
 gr2d-syncpt = host1x_syncpt_alloc(dev, 0);
 if (!gr2d-syncpt) {
 +   /* Do we really need this?
 +  After host1x_channel_alloc, the refcount of this
 +  channel is 0. So calling host1x_channel_put here will
 +  make the refcount going to negative.
 +  I suppose we should call host1x_free_channel here. */

True. I need to also export host1x_channel_free (I will change the name,
too).

 host1x_channel_put(gr2d-channel);
 return -ENOMEM;
 }
 @@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
 platform_device *dev)
 
 err = tegra_drm_register_client(tegradrm, gr2d-client);
 if (err)
 +   /* Add host1x_free_channel */

Will do.

 return err;
 
 platform_set_drvdata(dev, gr2d);
 diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
 index 3705cae..ed83b9e 100644
 --- a/drivers/gpu/host1x/channel.c
 +++ b/drivers/gpu/host1x/channel.c
 @@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
 platform_device *pdev)
 int max_channels = host1x-info.nb_channels;
 int err;
 
 +   /* Is it necessary to add mutex protection here?
 +  I'm just wondering in a smp system, multiple host1x clients
 +  may try to alloc their channels simultaneously... */

I'll add locking.

 if (chindex  max_channels)
 return NULL;
 
 diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
 index 86d5c70..f2bd5aa 100644
 --- a/drivers/gpu/host1x/debug.c
 +++ b/drivers/gpu/host1x/debug.c
 @@ -164,6 +164,10 @@ static const struct file_operations
 host1x_debug_fops = {
 
  void host1x_debug_init(struct host1x *host1x)
  {
 +   /* I think it's better to set this directory name the same with
 +  the driver's name -- defined in dev.c:
 +  #define DRIVER_NAME  tegra-host1x
 +  */
 struct dentry *de = debugfs_create_dir(tegra_host, NULL);

Will do.

 
 if (!de)
 diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
 index 07e8813..01ed10d 100644
 --- a/drivers/gpu/host1x/dev.c
 +++ b/drivers/gpu/host1x/dev.c
 @@ -38,6 +38,7 @@
 
  struct host1x *host1x;
 
 +/* Called by drm unlocked ioctl function. So do we need a lock here? */
  void host1x_syncpt_incr_byid(u32 id)

No, host1x_syncpt_incr_byid is SMP safe.

  {
 struct host1x_syncpt *sp = host1x-syncpt + id;
 @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
  }
  EXPORT_SYMBOL(host1x_syncpt_read_byid);
 
 +/* Called by drm unlocked ioctl function. So do we need a lock here? */
  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)

Same here, SMP safe.

  {
 struct host1x_syncpt *sp = host1x-syncpt + id;
 @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)
 
 err = host1x_alloc_resources(host);
 if (err) {
 +   /* We don't have chip_ops right now, so here the
 +  error message is somewhat improper */
 dev_err(dev-dev, failed to init chip support\n);
 goto fail;
 }

Actually, alloc_resources only allocates intr-syncpt, so I the code to
host1x_intr_init().

 @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
 if (!host-syncpt)
 goto fail;
 
 +   /* I suggest create a dedicate function for initializing nop sp.
 +  First this _host1x_syncpt_alloc looks like an internal/static
 +  function.
 +  Then actually host1x_syncpt_alloc  _host1x_syncpt_alloc all
 +  serve host1x client(e.g: gr2d) so it's not suitable to use them
 +  for nop sp.
 +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
 +  This will make the code more readable. */
 host-nop_sp = _host1x_syncpt_alloc(host, NULL, 0);

_host1x_syncpt_alloc is an internal function, not exported.

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 28.12.2012 11:14, Mark Zhang wrote:
 diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
 index a936902..c3ded60 100644
 --- a/drivers/gpu/drm/tegra/gr2d.c
 +++ b/drivers/gpu/drm/tegra/gr2d.c
 @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
 *context,
 if (err)
 goto fail;
 
 +   /* We define CMA as an temporary solution in host1x driver.
 +  That's also why we have a CMA kernel config in it.
 +  But seems here, in tegradrm, we hardcode the CMA here.
 +  So what do you do when host1x change to IOMMU?
 +  Do we also need to define a CMA config in tegradrm
 driver,
 +  then after host1x changes to IOMMU, we add another IOMMU
 +  config in tegradrm? Or we should invent another more
 +  generic wrapper layer here? */
 cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
 cmdbuf.mem);

Lucas is working on host1x allocator, so let's defer this comment until
we have Lucas' code.

 diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
 index cc8ca2f..0867b72 100644
 --- a/drivers/gpu/host1x/job.c
 +++ b/drivers/gpu/host1x/job.c
 @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
 host1x_channel *ch,
 mem += num_unpins * sizeof(dma_addr_t);
 job-pin_ids = num_unpins ? mem : NULL;
 
 +   /* I think this is a somewhat ugly design.
 +  Actually addr_phys is consisted by reloc_addr_phys and
 +  gather_addr_phys.
 +  Why don't we just declare reloc_addr_phys and
 gather_addr_phys?
 +  In current design, let's say if one nvhost newbie changes the
 order
 +  of these 2 blocks of code in function pin_job_mem:
 +
 +  for (i = 0; i  job-num_relocs; i++) {
 +   struct host1x_reloc *reloc = job-relocarray[i];
 +   job-pin_ids[count] = reloc-target;
 +   count++;
 +  }
 +
 +  for (i = 0; i  job-num_gathers; i++) {
 +   struct host1x_job_gather *g = job-gathers[i];
 +   job-pin_ids[count] = g-mem_id;
 +   count++;
 +  }
 +
 +  Then likely something weird occurs... */

We do this because this way we can implement batch pinning of memory
handles. This way we can decrease memory handle management a lot as we
need to do locking only once per submit.

Decreasing memory management overhead is really important, because in
complex graphics cases, there are might be a hundreds of buffer
references per submit, and several submits per frame. Any extra overhead
relates directly to reduced performance.

 job-reloc_addr_phys = job-addr_phys;
 job-gather_addr_phys = job-addr_phys[num_relocs];
 
 @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
 }
 }
 
 +   /* I have question here. Does this mean the address info
 +  which need to be relocated(according to the libdrm codes,
 +  seems this address is 0xDEADBEEF) always staying at the
 +  beginning of a page? */
 __raw_writel(
 (job-reloc_addr_phys[i] +
 reloc-target_offset)  reloc-shift,

No - the slot can be anywhere. That's why we have cmdbuf_offset in the
reloc struct.

 @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
 platform_device *pdev)
 }
 }
 
 +   /* if (host1x_firewall  !err) { */
 if (host1x_firewall) {
 err = copy_gathers(job, pdev);
 if (err) {

Will add.

 @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
 platform_device *pdev)
 }
 }
 
 +/* Rename this label to out or something else.
 +   Because if everything goes right, the codes under this label also
 +   get executed. */
  fail:
 wmb();

Will do.

 
 diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
 index f3954f7..bb5763d 100644
 --- a/drivers/gpu/host1x/memmgr.c
 +++ b/drivers/gpu/host1x/memmgr.c
 @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
 platform_device *dev,
 count, unpin_data[pin_count],
 phys_addr);
 
 +   /* I don't think the current host1x_cma_pin_array_ids
 +  is able to return a negative value. So this if doesn't
 +  make sense...*/
 if (cma_count  0) {
 /* clean up previous handles */
 while (pin_count) {

It should return negative in case of error.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Mark Zhang
On 01/02/2013 05:25 PM, Terje Bergström wrote:
 On 26.12.2012 11:42, Mark Zhang wrote:
[...]
 

 if (!de)
 diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
 index 07e8813..01ed10d 100644
 --- a/drivers/gpu/host1x/dev.c
 +++ b/drivers/gpu/host1x/dev.c
 @@ -38,6 +38,7 @@

  struct host1x *host1x;

 +/* Called by drm unlocked ioctl function. So do we need a lock here? */
  void host1x_syncpt_incr_byid(u32 id)
 
 No, host1x_syncpt_incr_byid is SMP safe.

Correct. Lock is unnecessary.

 
  {
 struct host1x_syncpt *sp = host1x-syncpt + id;
 @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
  }
  EXPORT_SYMBOL(host1x_syncpt_read_byid);

 +/* Called by drm unlocked ioctl function. So do we need a lock here? */
  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)
 
 Same here, SMP safe.
 
  {
 struct host1x_syncpt *sp = host1x-syncpt + id;
 @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)

 err = host1x_alloc_resources(host);
 if (err) {
 +   /* We don't have chip_ops right now, so here the
 +  error message is somewhat improper */
 dev_err(dev-dev, failed to init chip support\n);
 goto fail;
 }
 
 Actually, alloc_resources only allocates intr-syncpt, so I the code to
 host1x_intr_init().
 
 @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
 if (!host-syncpt)
 goto fail;

 +   /* I suggest create a dedicate function for initializing nop sp.
 +  First this _host1x_syncpt_alloc looks like an internal/static
 +  function.
 +  Then actually host1x_syncpt_alloc  _host1x_syncpt_alloc all
 +  serve host1x client(e.g: gr2d) so it's not suitable to use them
 +  for nop sp.
 +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
 +  This will make the code more readable. */
 host-nop_sp = _host1x_syncpt_alloc(host, NULL, 0);
 
 _host1x_syncpt_alloc is an internal function, not exported.
 host1x_syncpt_alloc is exported. I think it's even better if I just move
 allocation of nop_sp to happen in host1x_syncpt_init.
 

Agree.

 if (!host-nop_sp)
 goto fail;
[...]
 
 diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
 index efcb9be..e112001 100644
 --- a/drivers/gpu/host1x/intr.c
 +++ b/drivers/gpu/host1x/intr.c
 @@ -329,8 +329,13 @@ void host1x_intr_deinit(struct host1x_intr *intr)
  void host1x_intr_start(struct host1x_intr *intr, u32 hz)
  {
 struct host1x *host1x = intr_to_host1x(intr);
 +
 +   /* Why we need to lock here? Seems like this function is
 +  called by host1x's probe only. */
 mutex_lock(intr-mutex);

 +   /* init_host_sync has already been called in function
 +  host1x_intr_init. Remove this line. */
 host1x-intr_op.init_host_sync(intr);
 host1x-intr_op.set_host_clocks_per_usec(intr,
 DIV_ROUND_UP(hz, 100));
 
 In future, we'll call host1x_intr_start() whenever host1x is turned on.
 Thus we need locking.

 init_host_sync() should actually be called from host1x_intr_start(), not
 _init().
 

Got it. Thanks for explanation.

 diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
 index 07cbca5..9a234ad 100644
 --- a/drivers/gpu/host1x/syncpt.c
 +++ b/drivers/gpu/host1x/syncpt.c
 @@ -309,6 +309,8 @@ struct host1x_syncpt *host1x_syncpt_init(struct
 host1x *host)
 struct host1x_syncpt *syncpt, *sp;
 int i;

 +   /* Consider devm_kzalloc here. Then you can forget the release
 +  stuffs about this syncpt. */
 syncpt = sp = kzalloc(sizeof(struct host1x_syncpt) * 
 host-info.nb_pts,
 GFP_KERNEL);
 if (!syncpt)
 
 Will do.
 
 Thanks!
 
 Terje
 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Mark Zhang
On 01/02/2013 05:42 PM, Terje Bergström wrote:
 On 28.12.2012 11:14, Mark Zhang wrote:
 diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
 index a936902..c3ded60 100644
 --- a/drivers/gpu/drm/tegra/gr2d.c
 +++ b/drivers/gpu/drm/tegra/gr2d.c
 @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
 *context,
 if (err)
 goto fail;

 +   /* We define CMA as an temporary solution in host1x driver.
 +  That's also why we have a CMA kernel config in it.
 +  But seems here, in tegradrm, we hardcode the CMA here.
 +  So what do you do when host1x change to IOMMU?
 +  Do we also need to define a CMA config in tegradrm
 driver,
 +  then after host1x changes to IOMMU, we add another IOMMU
 +  config in tegradrm? Or we should invent another more
 +  generic wrapper layer here? */
 cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
 cmdbuf.mem);
 
 Lucas is working on host1x allocator, so let's defer this comment until
 we have Lucas' code.
 

OK.

 diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
 index cc8ca2f..0867b72 100644
 --- a/drivers/gpu/host1x/job.c
 +++ b/drivers/gpu/host1x/job.c
 @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
 host1x_channel *ch,
 mem += num_unpins * sizeof(dma_addr_t);
 job-pin_ids = num_unpins ? mem : NULL;

 +   /* I think this is a somewhat ugly design.
 +  Actually addr_phys is consisted by reloc_addr_phys and
 +  gather_addr_phys.
 +  Why don't we just declare reloc_addr_phys and
 gather_addr_phys?
 +  In current design, let's say if one nvhost newbie changes the
 order
 +  of these 2 blocks of code in function pin_job_mem:
 +
 +  for (i = 0; i  job-num_relocs; i++) {
 +   struct host1x_reloc *reloc = job-relocarray[i];
 +   job-pin_ids[count] = reloc-target;
 +   count++;
 +  }
 +
 +  for (i = 0; i  job-num_gathers; i++) {
 +   struct host1x_job_gather *g = job-gathers[i];
 +   job-pin_ids[count] = g-mem_id;
 +   count++;
 +  }
 +
 +  Then likely something weird occurs... */
 
 We do this because this way we can implement batch pinning of memory
 handles. This way we can decrease memory handle management a lot as we
 need to do locking only once per submit.
 

Sorry I didn't get it. Yes, in current design, you can pin all mem
handles in one time but I haven't found anything related with locking
only once per submit.

My idea is:
- remove job-addr_phys
- assign job-reloc_addr_phys  job-gather_addr_phys separately
- In pin_job_mem, just call host1x_memmgr_pin_array_ids twice to
fill the reloc_addr_phy  gather_addr_phys.

Anything I misunderstood?

 Decreasing memory management overhead is really important, because in
 complex graphics cases, there are might be a hundreds of buffer
 references per submit, and several submits per frame. Any extra overhead
 relates directly to reduced performance.
 
 job-reloc_addr_phys = job-addr_phys;
 job-gather_addr_phys = job-addr_phys[num_relocs];

 @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
 }
 }

 +   /* I have question here. Does this mean the address info
 +  which need to be relocated(according to the libdrm codes,
 +  seems this address is 0xDEADBEEF) always staying at the
 +  beginning of a page? */
 __raw_writel(
 (job-reloc_addr_phys[i] +
 reloc-target_offset)  reloc-shift,
 
 No - the slot can be anywhere. That's why we have cmdbuf_offset in the
 reloc struct.
 

Yes. Sorry I must misread the code before.

 @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
 platform_device *pdev)
 }
 }

 +   /* if (host1x_firewall  !err) { */
 if (host1x_firewall) {
 err = copy_gathers(job, pdev);
 if (err) {
 
 Will add.
 
 @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
 platform_device *pdev)
 }
 }

 +/* Rename this label to out or something else.
 +   Because if everything goes right, the codes under this label also
 +   get executed. */
  fail:
 wmb();
 
 Will do.
 

 diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
 index f3954f7..bb5763d 100644
 --- a/drivers/gpu/host1x/memmgr.c
 +++ b/drivers/gpu/host1x/memmgr.c
 @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
 platform_device *dev,
 count, unpin_data[pin_count],
 phys_addr);

 +   /* I don't think the current host1x_cma_pin_array_ids
 +  

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 03.01.2013 05:31, Mark Zhang wrote:
 Sorry I didn't get it. Yes, in current design, you can pin all mem
 handles in one time but I haven't found anything related with locking
 only once per submit.
 
 My idea is:
 - remove job-addr_phys
 - assign job-reloc_addr_phys  job-gather_addr_phys separately
 - In pin_job_mem, just call host1x_memmgr_pin_array_ids twice to
 fill the reloc_addr_phy  gather_addr_phys.
 
 Anything I misunderstood?

The current design uses CMA, which makes pinning basically a no-op. When
we have IOMMU support, pinning requires calling into IOMMU. Updating
SMMU tables requires locking, and probably maintenance before SMMU code
also requires its own locked data tables. For example, preventing
duplicate pinning might require a global table of handles.

Putting all of the handles in one table allows doing duplicate detection
across cmdbuf and reloc tables. This allows pinning each buffer exactly
once, which reduces number of calls to IOMMU.

 host1x_cma_pin_array_ids doesn't return negative value right now, so
 maybe you need to take a look at it.

True, and also a consequence of using CMA: pinning can never fail. With
IOMMU, pinning can fail.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Mark Zhang
On 01/03/2013 01:50 PM, Terje Bergström wrote:
 On 03.01.2013 05:31, Mark Zhang wrote:
 Sorry I didn't get it. Yes, in current design, you can pin all mem
 handles in one time but I haven't found anything related with locking
 only once per submit.

 My idea is:
 - remove job-addr_phys
 - assign job-reloc_addr_phys  job-gather_addr_phys separately
 - In pin_job_mem, just call host1x_memmgr_pin_array_ids twice to
 fill the reloc_addr_phy  gather_addr_phys.

 Anything I misunderstood?
 
 The current design uses CMA, which makes pinning basically a no-op. When
 we have IOMMU support, pinning requires calling into IOMMU. Updating
 SMMU tables requires locking, and probably maintenance before SMMU code
 also requires its own locked data tables. For example, preventing
 duplicate pinning might require a global table of handles.
 
 Putting all of the handles in one table allows doing duplicate detection
 across cmdbuf and reloc tables. This allows pinning each buffer exactly
 once, which reduces number of calls to IOMMU.
 

Thanks Terje. Yes, I understand IOMMU will bring in more operations. But
I'm not convinced that separating 2 arrays have lots of overheads than
putting them into one array.

But it doesn't matter, after the IOMMU support version released, I can
read this part of codes again. Let's talk about this at that time.

 host1x_cma_pin_array_ids doesn't return negative value right now, so
 maybe you need to take a look at it.
 
 True, and also a consequence of using CMA: pinning can never fail. With
 IOMMU, pinning can fail.

Got it. Agree.

 
 Terje
 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
 Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
 Some of the issues left open:
  * Register definitions still use static inline. There has been a
debate about code style versus ability to use compiler type
checking and code coverage analysis. There was no conclusion, so
I left it as was.
 This has to be resolved before merging. Personally I'm in favour of
 keeping reg access patterns close to what is done in other parts of the
 kernel.

How about if I define inline functions, and #defines to proxy to them?
Something like:

static inline u32 host1x_sync_cfpeek_ctrl_r(void)
{
return 0x74c;
}
#define HOST1X_SYNC_CFPEEK_CTRL \
host1x_sync_cfpeek_ctrl_r()

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
{
return (v  0x1)  31;
}
#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \
host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v)

I'd use the macros in .c files. That way the references will look
familiar to reader of .c files, but we can still get the benefits of
compiler processing (type checking, better error codes etc) and gcov
coverage tracking.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-28 Thread Mark Zhang
Hi Terje,

Here is the second part comments. I admit I still haven't finished
reading the codes... really too many codes. :)
Anyway I'll keep doing this when I have free slots.

diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index a936902..c3ded60 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
*context,
if (err)
goto fail;

+   /* We define CMA as an temporary solution in host1x driver.
+  That's also why we have a CMA kernel config in it.
+  But seems here, in tegradrm, we hardcode the CMA here.
+  So what do you do when host1x change to IOMMU?
+  Do we also need to define a CMA config in tegradrm
driver,
+  then after host1x changes to IOMMU, we add another IOMMU
+  config in tegradrm? Or we should invent another more
+  generic wrapper layer here? */
cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
cmdbuf.mem);
if (!cmdbuf.mem)
goto fail;
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index cc8ca2f..0867b72 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
host1x_channel *ch,
mem += num_unpins * sizeof(dma_addr_t);
job->pin_ids = num_unpins ? mem : NULL;

+   /* I think this is a somewhat ugly design.
+  Actually "addr_phys" is consisted by "reloc_addr_phys" and
+  "gather_addr_phys".
+  Why don't we just declare "reloc_addr_phys" and
"gather_addr_phys"?
+  In current design, let's say if one nvhost newbie changes the
order
+  of these 2 blocks of code in function "pin_job_mem":
+
+  for (i = 0; i < job->num_relocs; i++) {
+   struct host1x_reloc *reloc = >relocarray[i];
+   job->pin_ids[count] = reloc->target;
+   count++;
+  }
+
+  for (i = 0; i < job->num_gathers; i++) {
+   struct host1x_job_gather *g = >gathers[i];
+   job->pin_ids[count] = g->mem_id;
+   count++;
+  }
+
+  Then likely something weird occurs... */
job->reloc_addr_phys = job->addr_phys;
job->gather_addr_phys = >addr_phys[num_relocs];

@@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
}
}

+   /* I have question here. Does this mean the address info
+  which need to be relocated(according to the libdrm codes,
+  seems this address is "0xDEADBEEF") always staying at the
+  beginning of a page? */
__raw_writel(
(job->reloc_addr_phys[i] +
reloc->target_offset) >> reloc->shift,
@@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
platform_device *pdev)
}
}

+   /* if (host1x_firewall && !err) { */
if (host1x_firewall) {
err = copy_gathers(job, pdev);
if (err) {
@@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
platform_device *pdev)
}
}

+/* Rename this label to "out" or something else.
+   Because if everything goes right, the codes under this label also
+   get executed. */
 fail:
wmb();

diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
index f3954f7..bb5763d 100644
--- a/drivers/gpu/host1x/memmgr.c
+++ b/drivers/gpu/host1x/memmgr.c
@@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
platform_device *dev,
count, _data[pin_count],
phys_addr);

+   /* I don't think the current "host1x_cma_pin_array_ids"
+  is able to return a negative value. So this "if" doesn't
+  make sense...*/
if (cma_count < 0) {
/* clean up previous handles */
while (pin_count) {

Mark
On 12/21/2012 07:39 PM, Terje Bergstrom wrote:
> This set of patches adds support for Tegra20 and Tegra30 host1x and
> 2D. It is based on linux-next-20121220.
> 
> The fourth version has only few changes compared to previous version:
>  * Fixed some sparse warnings
>  * Fixed host1x Makefile to allow building as module
>  * Fixed host1x module unload
>  * Fixed tegradrm unload sequence
>  * host1x creates DRM dummy device and tegradrm uses it for storing
>DRM related data.
> 
> Some of the issues left open:
>  * Register definitions still use static inline. There has been a
>debate about code style versus ability to use compiler type
>checking and code coverage analysis. There was no conclusion, so
>I left it as was.
>  * Uses still CMA helpers. 

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-28 Thread Mark Zhang
Hi Terje,

Here is the second part comments. I admit I still haven't finished
reading the codes... really too many codes. :)
Anyway I'll keep doing this when I have free slots.

diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index a936902..c3ded60 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
*context,
if (err)
goto fail;

+   /* We define CMA as an temporary solution in host1x driver.
+  That's also why we have a CMA kernel config in it.
+  But seems here, in tegradrm, we hardcode the CMA here.
+  So what do you do when host1x change to IOMMU?
+  Do we also need to define a CMA config in tegradrm
driver,
+  then after host1x changes to IOMMU, we add another IOMMU
+  config in tegradrm? Or we should invent another more
+  generic wrapper layer here? */
cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
cmdbuf.mem);
if (!cmdbuf.mem)
goto fail;
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index cc8ca2f..0867b72 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
host1x_channel *ch,
mem += num_unpins * sizeof(dma_addr_t);
job-pin_ids = num_unpins ? mem : NULL;

+   /* I think this is a somewhat ugly design.
+  Actually addr_phys is consisted by reloc_addr_phys and
+  gather_addr_phys.
+  Why don't we just declare reloc_addr_phys and
gather_addr_phys?
+  In current design, let's say if one nvhost newbie changes the
order
+  of these 2 blocks of code in function pin_job_mem:
+
+  for (i = 0; i  job-num_relocs; i++) {
+   struct host1x_reloc *reloc = job-relocarray[i];
+   job-pin_ids[count] = reloc-target;
+   count++;
+  }
+
+  for (i = 0; i  job-num_gathers; i++) {
+   struct host1x_job_gather *g = job-gathers[i];
+   job-pin_ids[count] = g-mem_id;
+   count++;
+  }
+
+  Then likely something weird occurs... */
job-reloc_addr_phys = job-addr_phys;
job-gather_addr_phys = job-addr_phys[num_relocs];

@@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
}
}

+   /* I have question here. Does this mean the address info
+  which need to be relocated(according to the libdrm codes,
+  seems this address is 0xDEADBEEF) always staying at the
+  beginning of a page? */
__raw_writel(
(job-reloc_addr_phys[i] +
reloc-target_offset)  reloc-shift,
@@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
platform_device *pdev)
}
}

+   /* if (host1x_firewall  !err) { */
if (host1x_firewall) {
err = copy_gathers(job, pdev);
if (err) {
@@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
platform_device *pdev)
}
}

+/* Rename this label to out or something else.
+   Because if everything goes right, the codes under this label also
+   get executed. */
 fail:
wmb();

diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
index f3954f7..bb5763d 100644
--- a/drivers/gpu/host1x/memmgr.c
+++ b/drivers/gpu/host1x/memmgr.c
@@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
platform_device *dev,
count, unpin_data[pin_count],
phys_addr);

+   /* I don't think the current host1x_cma_pin_array_ids
+  is able to return a negative value. So this if doesn't
+  make sense...*/
if (cma_count  0) {
/* clean up previous handles */
while (pin_count) {

Mark
On 12/21/2012 07:39 PM, Terje Bergstrom wrote:
 This set of patches adds support for Tegra20 and Tegra30 host1x and
 2D. It is based on linux-next-20121220.
 
 The fourth version has only few changes compared to previous version:
  * Fixed some sparse warnings
  * Fixed host1x Makefile to allow building as module
  * Fixed host1x module unload
  * Fixed tegradrm unload sequence
  * host1x creates DRM dummy device and tegradrm uses it for storing
DRM related data.
 
 Some of the issues left open:
  * Register definitions still use static inline. There has been a
debate about code style versus ability to use compiler type
checking and code coverage analysis. There was no conclusion, so
I left it as was.
  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
 

[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-26 Thread Mark Zhang
Hi Terje,

I applied your patches on top of upstream 1224 kernel. Then I read the
codes. So here is my review comments(I use "git diff" to print out,
check below). I admit it's easy for me to not need to find the
corresponding lines in your 8 patch mails, but I've no idea whether it
is ok for you. If this makes you not feeling good, I'll do this in old
ways. Basically, I think in this diff output, there are filename/line
number/function name, so it should not be a hard work for you to
understand my comments.

P.S: I haven't finished the review so here is what I found today.

diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index a936902..28954b3 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
platform_device *dev)
 {
int err;
struct gr2d *gr2d = NULL;
+   /* Cause drm_device is created in host1x driver. So
+  if host1x drivers loads after tegradrm, then in this
+  gr2d_probe function, this "drm_device" will be NULL.
+  How can we handle this? Defer driver probe? */
struct platform_device *drm_device =
host1x_drm_device(to_platform_device(dev->dev.parent));
struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
@@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
platform_device *dev)

gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
if (!gr2d->syncpt) {
+   /* Do we really need this?
+  After "host1x_channel_alloc", the refcount of this
+  channel is 0. So calling host1x_channel_put here will
+  make the refcount going to negative.
+  I suppose we should call "host1x_free_channel" here. */
host1x_channel_put(gr2d->channel);
return -ENOMEM;
}
@@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
platform_device *dev)

err = tegra_drm_register_client(tegradrm, >client);
if (err)
+   /* Add "host1x_free_channel" */
return err;

platform_set_drvdata(dev, gr2d);
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
index 3705cae..ed83b9e 100644
--- a/drivers/gpu/host1x/channel.c
+++ b/drivers/gpu/host1x/channel.c
@@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
platform_device *pdev)
int max_channels = host1x->info.nb_channels;
int err;

+   /* Is it necessary to add mutex protection here?
+  I'm just wondering in a smp system, multiple host1x clients
+  may try to alloc their channels simultaneously... */
if (chindex > max_channels)
return NULL;

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 86d5c70..f2bd5aa 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -164,6 +164,10 @@ static const struct file_operations
host1x_debug_fops = {

 void host1x_debug_init(struct host1x *host1x)
 {
+   /* I think it's better to set this directory name the same with
+  the driver's name -- defined in dev.c:
+  #define DRIVER_NAME  "tegra-host1x"
+  */
struct dentry *de = debugfs_create_dir("tegra_host", NULL);

if (!de)
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 07e8813..01ed10d 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -38,6 +38,7 @@

 struct host1x *host1x;

+/* Called by drm unlocked ioctl function. So do we need a lock here? */
 void host1x_syncpt_incr_byid(u32 id)
 {
struct host1x_syncpt *sp = host1x->syncpt + id;
@@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
 }
 EXPORT_SYMBOL(host1x_syncpt_read_byid);

+/* Called by drm unlocked ioctl function. So do we need a lock here? */
 int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)
 {
struct host1x_syncpt *sp = host1x->syncpt + id;
@@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)

err = host1x_alloc_resources(host);
if (err) {
+   /* We don't have chip_ops right now, so here the
+  error message is somewhat improper */
dev_err(>dev, "failed to init chip support\n");
goto fail;
}
@@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
if (!host->syncpt)
goto fail;

+   /* I suggest create a dedicate function for initializing nop sp.
+  First this "_host1x_syncpt_alloc" looks like an internal/static
+  function.
+  Then actually "host1x_syncpt_alloc" & "_host1x_syncpt_alloc" all
+  serve host1x client(e.g: gr2d) so it's not suitable to use them
+  for nop sp.
+  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
+  This will make the code more readable. */
host->nop_sp = 

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-26 Thread Mark Zhang
Hi Terje,

I applied your patches on top of upstream 1224 kernel. Then I read the
codes. So here is my review comments(I use git diff to print out,
check below). I admit it's easy for me to not need to find the
corresponding lines in your 8 patch mails, but I've no idea whether it
is ok for you. If this makes you not feeling good, I'll do this in old
ways. Basically, I think in this diff output, there are filename/line
number/function name, so it should not be a hard work for you to
understand my comments.

P.S: I haven't finished the review so here is what I found today.

diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index a936902..28954b3 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
platform_device *dev)
 {
int err;
struct gr2d *gr2d = NULL;
+   /* Cause drm_device is created in host1x driver. So
+  if host1x drivers loads after tegradrm, then in this
+  gr2d_probe function, this drm_device will be NULL.
+  How can we handle this? Defer driver probe? */
struct platform_device *drm_device =
host1x_drm_device(to_platform_device(dev-dev.parent));
struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
@@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
platform_device *dev)

gr2d-syncpt = host1x_syncpt_alloc(dev, 0);
if (!gr2d-syncpt) {
+   /* Do we really need this?
+  After host1x_channel_alloc, the refcount of this
+  channel is 0. So calling host1x_channel_put here will
+  make the refcount going to negative.
+  I suppose we should call host1x_free_channel here. */
host1x_channel_put(gr2d-channel);
return -ENOMEM;
}
@@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
platform_device *dev)

err = tegra_drm_register_client(tegradrm, gr2d-client);
if (err)
+   /* Add host1x_free_channel */
return err;

platform_set_drvdata(dev, gr2d);
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
index 3705cae..ed83b9e 100644
--- a/drivers/gpu/host1x/channel.c
+++ b/drivers/gpu/host1x/channel.c
@@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
platform_device *pdev)
int max_channels = host1x-info.nb_channels;
int err;

+   /* Is it necessary to add mutex protection here?
+  I'm just wondering in a smp system, multiple host1x clients
+  may try to alloc their channels simultaneously... */
if (chindex  max_channels)
return NULL;

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 86d5c70..f2bd5aa 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -164,6 +164,10 @@ static const struct file_operations
host1x_debug_fops = {

 void host1x_debug_init(struct host1x *host1x)
 {
+   /* I think it's better to set this directory name the same with
+  the driver's name -- defined in dev.c:
+  #define DRIVER_NAME  tegra-host1x
+  */
struct dentry *de = debugfs_create_dir(tegra_host, NULL);

if (!de)
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 07e8813..01ed10d 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -38,6 +38,7 @@

 struct host1x *host1x;

+/* Called by drm unlocked ioctl function. So do we need a lock here? */
 void host1x_syncpt_incr_byid(u32 id)
 {
struct host1x_syncpt *sp = host1x-syncpt + id;
@@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
 }
 EXPORT_SYMBOL(host1x_syncpt_read_byid);

+/* Called by drm unlocked ioctl function. So do we need a lock here? */
 int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)
 {
struct host1x_syncpt *sp = host1x-syncpt + id;
@@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)

err = host1x_alloc_resources(host);
if (err) {
+   /* We don't have chip_ops right now, so here the
+  error message is somewhat improper */
dev_err(dev-dev, failed to init chip support\n);
goto fail;
}
@@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
if (!host-syncpt)
goto fail;

+   /* I suggest create a dedicate function for initializing nop sp.
+  First this _host1x_syncpt_alloc looks like an internal/static
+  function.
+  Then actually host1x_syncpt_alloc  _host1x_syncpt_alloc all
+  serve host1x client(e.g: gr2d) so it's not suitable to use them
+  for nop sp.
+  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
+  This will make the code more readable. */
host-nop_sp = _host1x_syncpt_alloc(host, 

[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
> This has to be resolved before merging. Personally I'm in favour of
> keeping reg access patterns close to what is done in other parts of the
> kernel.

I haven't so far received comments from too many people, so that's why I
left it as is.

>>  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
>>specific allocator.
> 
> I really want the allocator issue resolved before talking about merging
> those patches. Proper IOMMU support will require a bit of rework of the
> current upstream IOMMU API, so it will still be a bit out.
> 
> But if you don't mind I'll try to implement the host1x allocator over
> the holidays and provide an incremental patch.

Do you mean host1x CMA allocator using dma mapping API? That'd be great
if you could help out. I've just tried to concentrate on getting ok for
base essentials before doing any extra work.

Terje



[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Lucas Stach
Am Freitag, den 21.12.2012, 15:57 +0200 schrieb Terje Bergstr?m:
> On 21.12.2012 15:50, Lucas Stach wrote:
> > This has to be resolved before merging. Personally I'm in favour of
> > keeping reg access patterns close to what is done in other parts of the
> > kernel.
> 
> I haven't so far received comments from too many people, so that's why I
> left it as is.
> 
> >>  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
> >>specific allocator.
> > 
> > I really want the allocator issue resolved before talking about merging
> > those patches. Proper IOMMU support will require a bit of rework of the
> > current upstream IOMMU API, so it will still be a bit out.
> > 
> > But if you don't mind I'll try to implement the host1x allocator over
> > the holidays and provide an incremental patch.
> 
> Do you mean host1x CMA allocator using dma mapping API? That'd be great
> if you could help out. I've just tried to concentrate on getting ok for
> base essentials before doing any extra work.
> 
Yes, exactly that.

I'll provide comments to the other patches when I find time to do so.
Digging in the code might help with this.

Regards,
Lucas



[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Lucas Stach
Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
> This set of patches adds support for Tegra20 and Tegra30 host1x and
> 2D. It is based on linux-next-20121220.
> 
> The fourth version has only few changes compared to previous version:
>  * Fixed some sparse warnings
>  * Fixed host1x Makefile to allow building as module
>  * Fixed host1x module unload
>  * Fixed tegradrm unload sequence
>  * host1x creates DRM dummy device and tegradrm uses it for storing
>DRM related data.
> 
> Some of the issues left open:
>  * Register definitions still use static inline. There has been a
>debate about code style versus ability to use compiler type
>checking and code coverage analysis. There was no conclusion, so
>I left it as was.
This has to be resolved before merging. Personally I'm in favour of
keeping reg access patterns close to what is done in other parts of the
kernel.

>  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
>specific allocator.

I really want the allocator issue resolved before talking about merging
those patches. Proper IOMMU support will require a bit of rework of the
current upstream IOMMU API, so it will still be a bit out.

But if you don't mind I'll try to implement the host1x allocator over
the holidays and provide an incremental patch.

Regards,
Lucas



[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Terje Bergstrom
This set of patches adds support for Tegra20 and Tegra30 host1x and
2D. It is based on linux-next-20121220.

The fourth version has only few changes compared to previous version:
 * Fixed some sparse warnings
 * Fixed host1x Makefile to allow building as module
 * Fixed host1x module unload
 * Fixed tegradrm unload sequence
 * host1x creates DRM dummy device and tegradrm uses it for storing
   DRM related data.

Some of the issues left open:
 * Register definitions still use static inline. There has been a
   debate about code style versus ability to use compiler type
   checking and code coverage analysis. There was no conclusion, so
   I left it as was.
 * Uses still CMA helpers. IOMMU support will replace CMA with host1x
   specific allocator.

host1x is the driver that controls host1x hardware. It supports
host1x command channels, synchronization, and memory management. It
is sectioned into logical driver under drivers/gpu/host1x and
physical driver under drivers/host1x/hw. The physical driver is
compiled with the hardware headers of the particular host1x version.

The hardware units are described (briefly) in the Tegra2 TRM. Wiki
page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction
also contains a short description of the functionality.

The patch set removes responsibility of host1x from tegradrm. At the
same time, it moves all drm related infrastructure in
drivers/gpu/drm/tegra/host1x.c to drm.c. To replace the host1x
central device, host1x creates a dummy device for tegradrm to hang
global data to. This is a break in responsibility split of tegradrm
taking care of DRM and host1x driver taking care of host1x and
clients, but this structure was insisted. I've kept creation of dummy
device as a separate patch to illustrate the alternatives. It can be
later squashed into other patches.

The patch set adds 2D driver to tegradrm, which uses host1x for
communicating with host1x to access sync points and channels. We
expect to use the same infrastructure for other host1x clients, so
we have kept host1x and tegradrm separate.

The patch set also adds user space API to tegradrm for accessing
host1x and 2D.

Arto Merilainen (1):
  drm: tegra: Remove redundant host1x

Terje Bergstrom (7):
  gpu: host1x: Add host1x driver
  gpu: host1x: Add syncpoint wait and interrupts
  gpu: host1x: Add channel support
  gpu: host1x: Add debug support
  ARM: tegra: Add board data and 2D clocks
  drm: tegra: Add gr2d device
  gpu: host1x: Register DRM dummy device

 arch/arm/mach-tegra/board-dt-tegra20.c  |1 +
 arch/arm/mach-tegra/board-dt-tegra30.c  |1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c   |2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c   |1 +
 drivers/gpu/Makefile|1 +
 drivers/gpu/drm/tegra/Kconfig   |2 +-
 drivers/gpu/drm/tegra/Makefile  |2 +-
 drivers/gpu/drm/tegra/dc.c  |   26 +-
 drivers/gpu/drm/tegra/drm.c |  433 ++-
 drivers/gpu/drm/tegra/drm.h |   72 ++--
 drivers/gpu/drm/tegra/fb.c  |   17 +-
 drivers/gpu/drm/tegra/gr2d.c|  309 ++
 drivers/gpu/drm/tegra/hdmi.c|   30 +-
 drivers/gpu/drm/tegra/host1x.c  |  325 --
 drivers/gpu/host1x/Kconfig  |   28 ++
 drivers/gpu/host1x/Makefile |   17 +
 drivers/gpu/host1x/cdma.c   |  475 
 drivers/gpu/host1x/cdma.h   |  107 +
 drivers/gpu/host1x/channel.c|  137 ++
 drivers/gpu/host1x/channel.h|   64 +++
 drivers/gpu/host1x/cma.c|  117 +
 drivers/gpu/host1x/cma.h|   43 ++
 drivers/gpu/host1x/debug.c  |  207 +
 drivers/gpu/host1x/debug.h  |   49 +++
 drivers/gpu/host1x/dev.c|  242 +++
 drivers/gpu/host1x/dev.h|  167 
 drivers/gpu/host1x/drm.c|   51 +++
 drivers/gpu/host1x/drm.h|   25 ++
 drivers/gpu/host1x/hw/Makefile  |6 +
 drivers/gpu/host1x/hw/cdma_hw.c |  480 +
 drivers/gpu/host1x/hw/cdma_hw.h |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c  |  147 +++
 drivers/gpu/host1x/hw/debug_hw.c|  399 +
 drivers/gpu/host1x/hw/host1x01.c|   46 ++
 drivers/gpu/host1x/hw/host1x01.h|   25 ++
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  150 +++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   98 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h|  179 
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  130 ++
 drivers/gpu/host1x/hw/intr_hw.c |  178 
 drivers/gpu/host1x/hw/syncpt_hw.c   |  157 +++
 drivers/gpu/host1x/intr.c   |  377 
 

[PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Terje Bergstrom
This set of patches adds support for Tegra20 and Tegra30 host1x and
2D. It is based on linux-next-20121220.

The fourth version has only few changes compared to previous version:
 * Fixed some sparse warnings
 * Fixed host1x Makefile to allow building as module
 * Fixed host1x module unload
 * Fixed tegradrm unload sequence
 * host1x creates DRM dummy device and tegradrm uses it for storing
   DRM related data.

Some of the issues left open:
 * Register definitions still use static inline. There has been a
   debate about code style versus ability to use compiler type
   checking and code coverage analysis. There was no conclusion, so
   I left it as was.
 * Uses still CMA helpers. IOMMU support will replace CMA with host1x
   specific allocator.

host1x is the driver that controls host1x hardware. It supports
host1x command channels, synchronization, and memory management. It
is sectioned into logical driver under drivers/gpu/host1x and
physical driver under drivers/host1x/hw. The physical driver is
compiled with the hardware headers of the particular host1x version.

The hardware units are described (briefly) in the Tegra2 TRM. Wiki
page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction
also contains a short description of the functionality.

The patch set removes responsibility of host1x from tegradrm. At the
same time, it moves all drm related infrastructure in
drivers/gpu/drm/tegra/host1x.c to drm.c. To replace the host1x
central device, host1x creates a dummy device for tegradrm to hang
global data to. This is a break in responsibility split of tegradrm
taking care of DRM and host1x driver taking care of host1x and
clients, but this structure was insisted. I've kept creation of dummy
device as a separate patch to illustrate the alternatives. It can be
later squashed into other patches.

The patch set adds 2D driver to tegradrm, which uses host1x for
communicating with host1x to access sync points and channels. We
expect to use the same infrastructure for other host1x clients, so
we have kept host1x and tegradrm separate.

The patch set also adds user space API to tegradrm for accessing
host1x and 2D.

Arto Merilainen (1):
  drm: tegra: Remove redundant host1x

Terje Bergstrom (7):
  gpu: host1x: Add host1x driver
  gpu: host1x: Add syncpoint wait and interrupts
  gpu: host1x: Add channel support
  gpu: host1x: Add debug support
  ARM: tegra: Add board data and 2D clocks
  drm: tegra: Add gr2d device
  gpu: host1x: Register DRM dummy device

 arch/arm/mach-tegra/board-dt-tegra20.c  |1 +
 arch/arm/mach-tegra/board-dt-tegra30.c  |1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c   |2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c   |1 +
 drivers/gpu/Makefile|1 +
 drivers/gpu/drm/tegra/Kconfig   |2 +-
 drivers/gpu/drm/tegra/Makefile  |2 +-
 drivers/gpu/drm/tegra/dc.c  |   26 +-
 drivers/gpu/drm/tegra/drm.c |  433 ++-
 drivers/gpu/drm/tegra/drm.h |   72 ++--
 drivers/gpu/drm/tegra/fb.c  |   17 +-
 drivers/gpu/drm/tegra/gr2d.c|  309 ++
 drivers/gpu/drm/tegra/hdmi.c|   30 +-
 drivers/gpu/drm/tegra/host1x.c  |  325 --
 drivers/gpu/host1x/Kconfig  |   28 ++
 drivers/gpu/host1x/Makefile |   17 +
 drivers/gpu/host1x/cdma.c   |  475 
 drivers/gpu/host1x/cdma.h   |  107 +
 drivers/gpu/host1x/channel.c|  137 ++
 drivers/gpu/host1x/channel.h|   64 +++
 drivers/gpu/host1x/cma.c|  117 +
 drivers/gpu/host1x/cma.h|   43 ++
 drivers/gpu/host1x/debug.c  |  207 +
 drivers/gpu/host1x/debug.h  |   49 +++
 drivers/gpu/host1x/dev.c|  242 +++
 drivers/gpu/host1x/dev.h|  167 
 drivers/gpu/host1x/drm.c|   51 +++
 drivers/gpu/host1x/drm.h|   25 ++
 drivers/gpu/host1x/hw/Makefile  |6 +
 drivers/gpu/host1x/hw/cdma_hw.c |  480 +
 drivers/gpu/host1x/hw/cdma_hw.h |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c  |  147 +++
 drivers/gpu/host1x/hw/debug_hw.c|  399 +
 drivers/gpu/host1x/hw/host1x01.c|   46 ++
 drivers/gpu/host1x/hw/host1x01.h|   25 ++
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  150 +++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   98 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h|  179 
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  130 ++
 drivers/gpu/host1x/hw/intr_hw.c |  178 
 drivers/gpu/host1x/hw/syncpt_hw.c   |  157 +++
 drivers/gpu/host1x/intr.c   |  377 
 

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Lucas Stach
Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
 This set of patches adds support for Tegra20 and Tegra30 host1x and
 2D. It is based on linux-next-20121220.
 
 The fourth version has only few changes compared to previous version:
  * Fixed some sparse warnings
  * Fixed host1x Makefile to allow building as module
  * Fixed host1x module unload
  * Fixed tegradrm unload sequence
  * host1x creates DRM dummy device and tegradrm uses it for storing
DRM related data.
 
 Some of the issues left open:
  * Register definitions still use static inline. There has been a
debate about code style versus ability to use compiler type
checking and code coverage analysis. There was no conclusion, so
I left it as was.
This has to be resolved before merging. Personally I'm in favour of
keeping reg access patterns close to what is done in other parts of the
kernel.

  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
specific allocator.

I really want the allocator issue resolved before talking about merging
those patches. Proper IOMMU support will require a bit of rework of the
current upstream IOMMU API, so it will still be a bit out.

But if you don't mind I'll try to implement the host1x allocator over
the holidays and provide an incremental patch.

Regards,
Lucas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
 This has to be resolved before merging. Personally I'm in favour of
 keeping reg access patterns close to what is done in other parts of the
 kernel.

I haven't so far received comments from too many people, so that's why I
left it as is.

  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
specific allocator.
 
 I really want the allocator issue resolved before talking about merging
 those patches. Proper IOMMU support will require a bit of rework of the
 current upstream IOMMU API, so it will still be a bit out.
 
 But if you don't mind I'll try to implement the host1x allocator over
 the holidays and provide an incremental patch.

Do you mean host1x CMA allocator using dma mapping API? That'd be great
if you could help out. I've just tried to concentrate on getting ok for
base essentials before doing any extra work.

Terje

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Lucas Stach
Am Freitag, den 21.12.2012, 15:57 +0200 schrieb Terje Bergström:
 On 21.12.2012 15:50, Lucas Stach wrote:
  This has to be resolved before merging. Personally I'm in favour of
  keeping reg access patterns close to what is done in other parts of the
  kernel.
 
 I haven't so far received comments from too many people, so that's why I
 left it as is.
 
   * Uses still CMA helpers. IOMMU support will replace CMA with host1x
 specific allocator.
  
  I really want the allocator issue resolved before talking about merging
  those patches. Proper IOMMU support will require a bit of rework of the
  current upstream IOMMU API, so it will still be a bit out.
  
  But if you don't mind I'll try to implement the host1x allocator over
  the holidays and provide an incremental patch.
 
 Do you mean host1x CMA allocator using dma mapping API? That'd be great
 if you could help out. I've just tried to concentrate on getting ok for
 base essentials before doing any extra work.
 
Yes, exactly that.

I'll provide comments to the other patches when I find time to do so.
Digging in the code might help with this.

Regards,
Lucas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel