Re: For the problem when using swiotlb

2014-11-26 Thread Ding Tianhong
On 2014/11/25 20:23, Catalin Marinas wrote:
> On Tue, Nov 25, 2014 at 11:29:05AM +, Russell King - ARM Linux wrote:
>> On Tue, Nov 25, 2014 at 10:58:15AM +, Catalin Marinas wrote:
>>> Since we don't have a coherent_dma_supported() function, we defer the
>>> validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
>>> can remove bouncing since swiotlb has relatively small buffers).
>>
>> Bouncing of coherent DMA buffers is insane; if you have to bounce them,
>> they're by definition not coherent.
> 
> "Bouncing" is not a proper term here. What I meant is not using the
> swiotlb bounce buffers for the dma_alloc_coherent(). The
> swiotlb_alloc_coherent() allows this but it doesn't do any
> copying/bouncing in such scenario (it would not make sense, as you said
> below).
> 

After the long discussion, I think the idea is clear, the drivers should set
correct dma mask and coherent dma mask to use the buffer avoiding using swiotlb,
and my case is fixed by this way.

On my aarch64 board, the coherent dma mask is always bigger than swiotlb bounce,
so dma_supported() looks no sense here, but maybe useful for other board, 
Thanks everyone.

Regards
Ding

>> Think about one of the common uses of coherent DMA buffers: ring buffers,
>> where both the CPU and the DMA agent write to the ring:
>>
>> - CPU writes to ring, loading address and length, then writing to the
>>   status word for the ring entry.
>> - DMA reads the ring status word, sees it owns the entry, processes it,
>>   DMA writes to the ring status word to give it back.
>>
>> What this means is that if you are bouncing the buffer, you are copying
>> it whole-sale between the CPU visible version and the DMA visible
>> version, which means that you can miss DMA updates to it.  So, bouncing
>> a coherent DMA buffer is simply not an acceptable implementation for
>> dma_alloc_coherent().
> 
> I agree, not arguing against this.
> 
>> As for validity of masks, it is defined in the DMA API documentation that
>> if a DMA mask is suitable for the streaming APIs, then it is also suitable
>> for the coherent APIs.  The reverse is left open, and so may not
>> necessarily be true.
>>
>> In other words:
>>
>>  err = dma_set_mask(dev, mask);
>>  if (err == 0)
>>  assert(dma_set_coherent_mask(dev, mask) == 0);
>>
>> must always succeed, but reversing the two calls has no guarantee.
> 
> That would succeed on arm64 currently.
> 
> The problem is that swiotlb bounce buffers allow for smaller than 32-bit
> dma_mask for streaming DMA since it can do bouncing. With coherent
> allocations, we could allow the same smaller than 32-bit
> coherent_dma_mask, however allocations may fail since they fall back to
> the swiotlb reserved buffer which is relatively small.
> 
> What I want to avoid is threads like this where people ask for bigger
> swiotlb buffers for coherent allocations most likely because they do the
> wrong thing with their dma masks (or dma-ranges property). If the arm64
> dma_alloc_coherent() code avoids calling swiotlb_alloc_coherent() and
> just limits itself to ZONE_DMA, we make it clear that the swiotlb
> buffers are only used for streaming DMA (bouncing).
> 
> Once we avoid swiotlb buffers for coherent DMA, the problem is that even
> though your assertion above would pass, dma_alloc_coherent() may fail if
> coherent_dma_mask is smaller than the end of ZONE_DMA (and pfn offsets
> considered). Basically breaking the assumption that you mentioned above
> with regards to coherent_dma_mask suitability.
> 
> I think the best we can get is for dma_supported() to ignore
> swiotlb_dma_supported() and simply check for ZONE_DMA (similar to the
> arm32 one). We guarantee that swiotlb bounce buffers are within ZONE_DMA
> already, so no need for calling swiotlb_dma_supported(). If we ever have
> devices with smaller than 32-bit mask on arm64, we will have to reduce
> ZONE_DMA.
> 
>> Note that there seems to be only one driver which has different coherent
>> and streaming DMA masks today:
>>
>> drivers/media/pci/sta2x11/sta2x11_vip.c:
>> if (dma_set_mask(>dev, DMA_BIT_MASK(26))) {
>> err = dma_set_coherent_mask(>pdev->dev, DMA_BIT_MASK(29));
> 
> This would work if we have a ZONE_DMA of 29-bit while the swiotlb bounce
> buffer within 26-bit. If we go with the above ZONE_DMA only check, we
> would need ZONE_DMA of 26-bit.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-26 Thread Ding Tianhong
On 2014/11/25 20:23, Catalin Marinas wrote:
 On Tue, Nov 25, 2014 at 11:29:05AM +, Russell King - ARM Linux wrote:
 On Tue, Nov 25, 2014 at 10:58:15AM +, Catalin Marinas wrote:
 Since we don't have a coherent_dma_supported() function, we defer the
 validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
 can remove bouncing since swiotlb has relatively small buffers).

 Bouncing of coherent DMA buffers is insane; if you have to bounce them,
 they're by definition not coherent.
 
 Bouncing is not a proper term here. What I meant is not using the
 swiotlb bounce buffers for the dma_alloc_coherent(). The
 swiotlb_alloc_coherent() allows this but it doesn't do any
 copying/bouncing in such scenario (it would not make sense, as you said
 below).
 

After the long discussion, I think the idea is clear, the drivers should set
correct dma mask and coherent dma mask to use the buffer avoiding using swiotlb,
and my case is fixed by this way.

On my aarch64 board, the coherent dma mask is always bigger than swiotlb bounce,
so dma_supported() looks no sense here, but maybe useful for other board, 
Thanks everyone.

Regards
Ding

 Think about one of the common uses of coherent DMA buffers: ring buffers,
 where both the CPU and the DMA agent write to the ring:

 - CPU writes to ring, loading address and length, then writing to the
   status word for the ring entry.
 - DMA reads the ring status word, sees it owns the entry, processes it,
   DMA writes to the ring status word to give it back.

 What this means is that if you are bouncing the buffer, you are copying
 it whole-sale between the CPU visible version and the DMA visible
 version, which means that you can miss DMA updates to it.  So, bouncing
 a coherent DMA buffer is simply not an acceptable implementation for
 dma_alloc_coherent().
 
 I agree, not arguing against this.
 
 As for validity of masks, it is defined in the DMA API documentation that
 if a DMA mask is suitable for the streaming APIs, then it is also suitable
 for the coherent APIs.  The reverse is left open, and so may not
 necessarily be true.

 In other words:

  err = dma_set_mask(dev, mask);
  if (err == 0)
  assert(dma_set_coherent_mask(dev, mask) == 0);

 must always succeed, but reversing the two calls has no guarantee.
 
 That would succeed on arm64 currently.
 
 The problem is that swiotlb bounce buffers allow for smaller than 32-bit
 dma_mask for streaming DMA since it can do bouncing. With coherent
 allocations, we could allow the same smaller than 32-bit
 coherent_dma_mask, however allocations may fail since they fall back to
 the swiotlb reserved buffer which is relatively small.
 
 What I want to avoid is threads like this where people ask for bigger
 swiotlb buffers for coherent allocations most likely because they do the
 wrong thing with their dma masks (or dma-ranges property). If the arm64
 dma_alloc_coherent() code avoids calling swiotlb_alloc_coherent() and
 just limits itself to ZONE_DMA, we make it clear that the swiotlb
 buffers are only used for streaming DMA (bouncing).
 
 Once we avoid swiotlb buffers for coherent DMA, the problem is that even
 though your assertion above would pass, dma_alloc_coherent() may fail if
 coherent_dma_mask is smaller than the end of ZONE_DMA (and pfn offsets
 considered). Basically breaking the assumption that you mentioned above
 with regards to coherent_dma_mask suitability.
 
 I think the best we can get is for dma_supported() to ignore
 swiotlb_dma_supported() and simply check for ZONE_DMA (similar to the
 arm32 one). We guarantee that swiotlb bounce buffers are within ZONE_DMA
 already, so no need for calling swiotlb_dma_supported(). If we ever have
 devices with smaller than 32-bit mask on arm64, we will have to reduce
 ZONE_DMA.
 
 Note that there seems to be only one driver which has different coherent
 and streaming DMA masks today:

 drivers/media/pci/sta2x11/sta2x11_vip.c:
 if (dma_set_mask(pdev-dev, DMA_BIT_MASK(26))) {
 err = dma_set_coherent_mask(vip-pdev-dev, DMA_BIT_MASK(29));
 
 This would work if we have a ZONE_DMA of 29-bit while the swiotlb bounce
 buffer within 26-bit. If we go with the above ZONE_DMA only check, we
 would need ZONE_DMA of 26-bit.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-25 Thread Catalin Marinas
On Tue, Nov 25, 2014 at 11:29:05AM +, Russell King - ARM Linux wrote:
> On Tue, Nov 25, 2014 at 10:58:15AM +, Catalin Marinas wrote:
> > Since we don't have a coherent_dma_supported() function, we defer the
> > validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
> > can remove bouncing since swiotlb has relatively small buffers).
> 
> Bouncing of coherent DMA buffers is insane; if you have to bounce them,
> they're by definition not coherent.

"Bouncing" is not a proper term here. What I meant is not using the
swiotlb bounce buffers for the dma_alloc_coherent(). The
swiotlb_alloc_coherent() allows this but it doesn't do any
copying/bouncing in such scenario (it would not make sense, as you said
below).

> Think about one of the common uses of coherent DMA buffers: ring buffers,
> where both the CPU and the DMA agent write to the ring:
> 
> - CPU writes to ring, loading address and length, then writing to the
>   status word for the ring entry.
> - DMA reads the ring status word, sees it owns the entry, processes it,
>   DMA writes to the ring status word to give it back.
> 
> What this means is that if you are bouncing the buffer, you are copying
> it whole-sale between the CPU visible version and the DMA visible
> version, which means that you can miss DMA updates to it.  So, bouncing
> a coherent DMA buffer is simply not an acceptable implementation for
> dma_alloc_coherent().

I agree, not arguing against this.

> As for validity of masks, it is defined in the DMA API documentation that
> if a DMA mask is suitable for the streaming APIs, then it is also suitable
> for the coherent APIs.  The reverse is left open, and so may not
> necessarily be true.
> 
> In other words:
> 
>   err = dma_set_mask(dev, mask);
>   if (err == 0)
>   assert(dma_set_coherent_mask(dev, mask) == 0);
> 
> must always succeed, but reversing the two calls has no guarantee.

That would succeed on arm64 currently.

The problem is that swiotlb bounce buffers allow for smaller than 32-bit
dma_mask for streaming DMA since it can do bouncing. With coherent
allocations, we could allow the same smaller than 32-bit
coherent_dma_mask, however allocations may fail since they fall back to
the swiotlb reserved buffer which is relatively small.

What I want to avoid is threads like this where people ask for bigger
swiotlb buffers for coherent allocations most likely because they do the
wrong thing with their dma masks (or dma-ranges property). If the arm64
dma_alloc_coherent() code avoids calling swiotlb_alloc_coherent() and
just limits itself to ZONE_DMA, we make it clear that the swiotlb
buffers are only used for streaming DMA (bouncing).

Once we avoid swiotlb buffers for coherent DMA, the problem is that even
though your assertion above would pass, dma_alloc_coherent() may fail if
coherent_dma_mask is smaller than the end of ZONE_DMA (and pfn offsets
considered). Basically breaking the assumption that you mentioned above
with regards to coherent_dma_mask suitability.

I think the best we can get is for dma_supported() to ignore
swiotlb_dma_supported() and simply check for ZONE_DMA (similar to the
arm32 one). We guarantee that swiotlb bounce buffers are within ZONE_DMA
already, so no need for calling swiotlb_dma_supported(). If we ever have
devices with smaller than 32-bit mask on arm64, we will have to reduce
ZONE_DMA.

> Note that there seems to be only one driver which has different coherent
> and streaming DMA masks today:
> 
> drivers/media/pci/sta2x11/sta2x11_vip.c:
> if (dma_set_mask(>dev, DMA_BIT_MASK(26))) {
> err = dma_set_coherent_mask(>pdev->dev, DMA_BIT_MASK(29));

This would work if we have a ZONE_DMA of 29-bit while the swiotlb bounce
buffer within 26-bit. If we go with the above ZONE_DMA only check, we
would need ZONE_DMA of 26-bit.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-25 Thread Russell King - ARM Linux
On Tue, Nov 25, 2014 at 10:58:15AM +, Catalin Marinas wrote:
> Since we don't have a coherent_dma_supported() function, we defer the
> validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
> can remove bouncing since swiotlb has relatively small buffers).

Bouncing of coherent DMA buffers is insane; if you have to bounce them,
they're by definition not coherent.

Think about one of the common uses of coherent DMA buffers: ring buffers,
where both the CPU and the DMA agent write to the ring:

- CPU writes to ring, loading address and length, then writing to the
  status word for the ring entry.
- DMA reads the ring status word, sees it owns the entry, processes it,
  DMA writes to the ring status word to give it back.

What this means is that if you are bouncing the buffer, you are copying
it whole-sale between the CPU visible version and the DMA visible
version, which means that you can miss DMA updates to it.  So, bouncing
a coherent DMA buffer is simply not an acceptable implementation for
dma_alloc_coherent().

As for validity of masks, it is defined in the DMA API documentation that
if a DMA mask is suitable for the streaming APIs, then it is also suitable
for the coherent APIs.  The reverse is left open, and so may not
necessarily be true.

In other words:

err = dma_set_mask(dev, mask);
if (err == 0)
assert(dma_set_coherent_mask(dev, mask) == 0);

must always succeed, but reversing the two calls has no guarantee.

Note that there seems to be only one driver which has different coherent
and streaming DMA masks today:

drivers/media/pci/sta2x11/sta2x11_vip.c:
if (dma_set_mask(>dev, DMA_BIT_MASK(26))) {
err = dma_set_coherent_mask(>pdev->dev, DMA_BIT_MASK(29));

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-25 Thread Catalin Marinas
On Mon, Nov 24, 2014 at 08:12:09PM +, Arnd Bergmann wrote:
> On Friday 21 November 2014 18:09:25 Catalin Marinas wrote:
> > On Fri, Nov 21, 2014 at 05:51:19PM +, Catalin Marinas wrote:
> > > On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
> > > > On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
> > > > > There is a scenario where smaller mask would work on arm64. For 
> > > > > example
> > > > > Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
> > > > > 0x8000). A device with 31-bit mask and a dma_pfn_offset of
> > > > > 0x8000 would still work (there isn't any but just as an example). 
> > > > > So
> > > > > the check in dma_alloc_coherent() would be something like:
> > > > > 
> > > > >   phys_to_dma(top of ZONE_DMA) - dma_pfn_offset <= 
> > > > > coherent_dma_mask
> > > > > 
> > > > > (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
> > > > > 
> > > > > If the condition above fails, dma_alloc_coherent() would no longer 
> > > > > fall
> > > > > back to swiotlb but issue a dev_warn() and return NULL.
> > > > 
> > > > Ah, that looks like it should work on all architectures, very nice.
> > > > How about checking this condition, and then printing a small warning
> > > > (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?
> > > 
> > > I would not add the above ZONE_DMA check to of_dma_configure(). For
> > > example on arm64, we may not support a small coherent_dma_mask but the
> > > same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
> > > However, that's an arch-specific decision. Maybe after the above setting
> > > of dev->coherent_dma_mask in of_dma_configure(), we could add:
> 
> You seem to implement the opposite:

Possibly, but I had something else in mind.

> > +   /*
> > +* If the bus dma-ranges property specifies a size smaller than 4GB,
> > +* the device would not be capable of accessing the whole 32-bit
> > +* space, so reduce the default coherent_dma_mask accordingly.
> > +*/
> > +   if (size && size < (1ULL << 32))
> > +   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> > +
> > +   /*
> > +* Set dma_mask to coherent_dma_mask by default if the architecture
> > +* code has not set it and DMA on such mask is supported.
> > +*/
> > +   if (!dev->dma_mask && dma_supported(dev, dev->coherent_dma_mask))
> > +   dev->dma_mask = >coherent_dma_mask;
> >  }
> 
> Here, coherent_dma_mask wouldn't work while dma_mask might be
> fine in case of swiotlb, but you set a nonzero coherent_dma_mask
> and an invalid dma_mask.

My assumption is that dma_supported() only checks the validity of
dma_mask (not the coherent one). On arm64 it is currently routed to
swiotlb_dma_supported() which returns true if the swiotlb bounce buffer
is within that mask. So if the coherent_dma_mask is enough for swiotlb
bounce buffer, we point dma_mask to it. Otherwise, there is no way to do
streaming DMA to such mask, hence setting it to NULL.

Since we don't have a coherent_dma_supported() function, we defer the
validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
can remove bouncing since swiotlb has relatively small buffers).

There is a slight downside if dma_supported() on the default 32-bit mask
fails, we end up with dma_mask == NULL and the driver calling
dma_set_mask_and_coherent(64-bit) would fail to set dma_mask even though
dma-ranges allows it. Is this a real scenario (not for arm64 where we
allow DMA into 32-bit via ZONE_DMA)?

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-25 Thread Catalin Marinas
On Mon, Nov 24, 2014 at 08:12:09PM +, Arnd Bergmann wrote:
 On Friday 21 November 2014 18:09:25 Catalin Marinas wrote:
  On Fri, Nov 21, 2014 at 05:51:19PM +, Catalin Marinas wrote:
   On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
 There is a scenario where smaller mask would work on arm64. For 
 example
 Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
 0x8000). A device with 31-bit mask and a dma_pfn_offset of
 0x8000 would still work (there isn't any but just as an example). 
 So
 the check in dma_alloc_coherent() would be something like:
 
   phys_to_dma(top of ZONE_DMA) - dma_pfn_offset = 
 coherent_dma_mask
 
 (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
 
 If the condition above fails, dma_alloc_coherent() would no longer 
 fall
 back to swiotlb but issue a dev_warn() and return NULL.

Ah, that looks like it should work on all architectures, very nice.
How about checking this condition, and then printing a small warning
(dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?
   
   I would not add the above ZONE_DMA check to of_dma_configure(). For
   example on arm64, we may not support a small coherent_dma_mask but the
   same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
   However, that's an arch-specific decision. Maybe after the above setting
   of dev-coherent_dma_mask in of_dma_configure(), we could add:
 
 You seem to implement the opposite:

Possibly, but I had something else in mind.

  +   /*
  +* If the bus dma-ranges property specifies a size smaller than 4GB,
  +* the device would not be capable of accessing the whole 32-bit
  +* space, so reduce the default coherent_dma_mask accordingly.
  +*/
  +   if (size  size  (1ULL  32))
  +   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
  +
  +   /*
  +* Set dma_mask to coherent_dma_mask by default if the architecture
  +* code has not set it and DMA on such mask is supported.
  +*/
  +   if (!dev-dma_mask  dma_supported(dev, dev-coherent_dma_mask))
  +   dev-dma_mask = dev-coherent_dma_mask;
   }
 
 Here, coherent_dma_mask wouldn't work while dma_mask might be
 fine in case of swiotlb, but you set a nonzero coherent_dma_mask
 and an invalid dma_mask.

My assumption is that dma_supported() only checks the validity of
dma_mask (not the coherent one). On arm64 it is currently routed to
swiotlb_dma_supported() which returns true if the swiotlb bounce buffer
is within that mask. So if the coherent_dma_mask is enough for swiotlb
bounce buffer, we point dma_mask to it. Otherwise, there is no way to do
streaming DMA to such mask, hence setting it to NULL.

Since we don't have a coherent_dma_supported() function, we defer the
validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
can remove bouncing since swiotlb has relatively small buffers).

There is a slight downside if dma_supported() on the default 32-bit mask
fails, we end up with dma_mask == NULL and the driver calling
dma_set_mask_and_coherent(64-bit) would fail to set dma_mask even though
dma-ranges allows it. Is this a real scenario (not for arm64 where we
allow DMA into 32-bit via ZONE_DMA)?

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-25 Thread Russell King - ARM Linux
On Tue, Nov 25, 2014 at 10:58:15AM +, Catalin Marinas wrote:
 Since we don't have a coherent_dma_supported() function, we defer the
 validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
 can remove bouncing since swiotlb has relatively small buffers).

Bouncing of coherent DMA buffers is insane; if you have to bounce them,
they're by definition not coherent.

Think about one of the common uses of coherent DMA buffers: ring buffers,
where both the CPU and the DMA agent write to the ring:

- CPU writes to ring, loading address and length, then writing to the
  status word for the ring entry.
- DMA reads the ring status word, sees it owns the entry, processes it,
  DMA writes to the ring status word to give it back.

What this means is that if you are bouncing the buffer, you are copying
it whole-sale between the CPU visible version and the DMA visible
version, which means that you can miss DMA updates to it.  So, bouncing
a coherent DMA buffer is simply not an acceptable implementation for
dma_alloc_coherent().

As for validity of masks, it is defined in the DMA API documentation that
if a DMA mask is suitable for the streaming APIs, then it is also suitable
for the coherent APIs.  The reverse is left open, and so may not
necessarily be true.

In other words:

err = dma_set_mask(dev, mask);
if (err == 0)
assert(dma_set_coherent_mask(dev, mask) == 0);

must always succeed, but reversing the two calls has no guarantee.

Note that there seems to be only one driver which has different coherent
and streaming DMA masks today:

drivers/media/pci/sta2x11/sta2x11_vip.c:
if (dma_set_mask(pdev-dev, DMA_BIT_MASK(26))) {
err = dma_set_coherent_mask(vip-pdev-dev, DMA_BIT_MASK(29));

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-25 Thread Catalin Marinas
On Tue, Nov 25, 2014 at 11:29:05AM +, Russell King - ARM Linux wrote:
 On Tue, Nov 25, 2014 at 10:58:15AM +, Catalin Marinas wrote:
  Since we don't have a coherent_dma_supported() function, we defer the
  validity check of coherent_dma_mask to dma_alloc_coherent() (and here we
  can remove bouncing since swiotlb has relatively small buffers).
 
 Bouncing of coherent DMA buffers is insane; if you have to bounce them,
 they're by definition not coherent.

Bouncing is not a proper term here. What I meant is not using the
swiotlb bounce buffers for the dma_alloc_coherent(). The
swiotlb_alloc_coherent() allows this but it doesn't do any
copying/bouncing in such scenario (it would not make sense, as you said
below).

 Think about one of the common uses of coherent DMA buffers: ring buffers,
 where both the CPU and the DMA agent write to the ring:
 
 - CPU writes to ring, loading address and length, then writing to the
   status word for the ring entry.
 - DMA reads the ring status word, sees it owns the entry, processes it,
   DMA writes to the ring status word to give it back.
 
 What this means is that if you are bouncing the buffer, you are copying
 it whole-sale between the CPU visible version and the DMA visible
 version, which means that you can miss DMA updates to it.  So, bouncing
 a coherent DMA buffer is simply not an acceptable implementation for
 dma_alloc_coherent().

I agree, not arguing against this.

 As for validity of masks, it is defined in the DMA API documentation that
 if a DMA mask is suitable for the streaming APIs, then it is also suitable
 for the coherent APIs.  The reverse is left open, and so may not
 necessarily be true.
 
 In other words:
 
   err = dma_set_mask(dev, mask);
   if (err == 0)
   assert(dma_set_coherent_mask(dev, mask) == 0);
 
 must always succeed, but reversing the two calls has no guarantee.

That would succeed on arm64 currently.

The problem is that swiotlb bounce buffers allow for smaller than 32-bit
dma_mask for streaming DMA since it can do bouncing. With coherent
allocations, we could allow the same smaller than 32-bit
coherent_dma_mask, however allocations may fail since they fall back to
the swiotlb reserved buffer which is relatively small.

What I want to avoid is threads like this where people ask for bigger
swiotlb buffers for coherent allocations most likely because they do the
wrong thing with their dma masks (or dma-ranges property). If the arm64
dma_alloc_coherent() code avoids calling swiotlb_alloc_coherent() and
just limits itself to ZONE_DMA, we make it clear that the swiotlb
buffers are only used for streaming DMA (bouncing).

Once we avoid swiotlb buffers for coherent DMA, the problem is that even
though your assertion above would pass, dma_alloc_coherent() may fail if
coherent_dma_mask is smaller than the end of ZONE_DMA (and pfn offsets
considered). Basically breaking the assumption that you mentioned above
with regards to coherent_dma_mask suitability.

I think the best we can get is for dma_supported() to ignore
swiotlb_dma_supported() and simply check for ZONE_DMA (similar to the
arm32 one). We guarantee that swiotlb bounce buffers are within ZONE_DMA
already, so no need for calling swiotlb_dma_supported(). If we ever have
devices with smaller than 32-bit mask on arm64, we will have to reduce
ZONE_DMA.

 Note that there seems to be only one driver which has different coherent
 and streaming DMA masks today:
 
 drivers/media/pci/sta2x11/sta2x11_vip.c:
 if (dma_set_mask(pdev-dev, DMA_BIT_MASK(26))) {
 err = dma_set_coherent_mask(vip-pdev-dev, DMA_BIT_MASK(29));

This would work if we have a ZONE_DMA of 29-bit while the swiotlb bounce
buffer within 26-bit. If we go with the above ZONE_DMA only check, we
would need ZONE_DMA of 26-bit.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-24 Thread Arnd Bergmann
On Friday 21 November 2014 18:09:25 Catalin Marinas wrote:
> On Fri, Nov 21, 2014 at 05:51:19PM +, Catalin Marinas wrote:
> > On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
> > > On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
> > > > There is a scenario where smaller mask would work on arm64. For example
> > > > Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
> > > > 0x8000). A device with 31-bit mask and a dma_pfn_offset of
> > > > 0x8000 would still work (there isn't any but just as an example). So
> > > > the check in dma_alloc_coherent() would be something like:
> > > > 
> > > > phys_to_dma(top of ZONE_DMA) - dma_pfn_offset <= 
> > > > coherent_dma_mask
> > > > 
> > > > (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
> > > > 
> > > > If the condition above fails, dma_alloc_coherent() would no longer fall
> > > > back to swiotlb but issue a dev_warn() and return NULL.
> > > 
> > > Ah, that looks like it should work on all architectures, very nice.
> > > How about checking this condition, and then printing a small warning
> > > (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?
> > 
> > I would not add the above ZONE_DMA check to of_dma_configure(). For
> > example on arm64, we may not support a small coherent_dma_mask but the
> > same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
> > However, that's an arch-specific decision. Maybe after the above setting
> > of dev->coherent_dma_mask in of_dma_configure(), we could add:
> > 

You seem to implement the opposite:

> + /*
> +  * If the bus dma-ranges property specifies a size smaller than 4GB,
> +  * the device would not be capable of accessing the whole 32-bit
> +  * space, so reduce the default coherent_dma_mask accordingly.
> +  */
> + if (size && size < (1ULL << 32))
> + dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> +
> + /*
> +  * Set dma_mask to coherent_dma_mask by default if the architecture
> +  * code has not set it and DMA on such mask is supported.
> +  */
> + if (!dev->dma_mask && dma_supported(dev, dev->coherent_dma_mask))
> + dev->dma_mask = >coherent_dma_mask;
>  }
>  

Here, coherent_dma_mask wouldn't work while dma_mask might be
fine in case of swiotlb, but you set a nonzero coherent_dma_mask
and an invalid dma_mask.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-24 Thread Arnd Bergmann
On Friday 21 November 2014 18:09:25 Catalin Marinas wrote:
 On Fri, Nov 21, 2014 at 05:51:19PM +, Catalin Marinas wrote:
  On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
   On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
There is a scenario where smaller mask would work on arm64. For example
Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
0x8000). A device with 31-bit mask and a dma_pfn_offset of
0x8000 would still work (there isn't any but just as an example). So
the check in dma_alloc_coherent() would be something like:

phys_to_dma(top of ZONE_DMA) - dma_pfn_offset = 
coherent_dma_mask

(or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)

If the condition above fails, dma_alloc_coherent() would no longer fall
back to swiotlb but issue a dev_warn() and return NULL.
   
   Ah, that looks like it should work on all architectures, very nice.
   How about checking this condition, and then printing a small warning
   (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?
  
  I would not add the above ZONE_DMA check to of_dma_configure(). For
  example on arm64, we may not support a small coherent_dma_mask but the
  same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
  However, that's an arch-specific decision. Maybe after the above setting
  of dev-coherent_dma_mask in of_dma_configure(), we could add:
  

You seem to implement the opposite:

 + /*
 +  * If the bus dma-ranges property specifies a size smaller than 4GB,
 +  * the device would not be capable of accessing the whole 32-bit
 +  * space, so reduce the default coherent_dma_mask accordingly.
 +  */
 + if (size  size  (1ULL  32))
 + dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
 +
 + /*
 +  * Set dma_mask to coherent_dma_mask by default if the architecture
 +  * code has not set it and DMA on such mask is supported.
 +  */
 + if (!dev-dma_mask  dma_supported(dev, dev-coherent_dma_mask))
 + dev-dma_mask = dev-coherent_dma_mask;
  }
  

Here, coherent_dma_mask wouldn't work while dma_mask might be
fine in case of swiotlb, but you set a nonzero coherent_dma_mask
and an invalid dma_mask.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 05:51:19PM +, Catalin Marinas wrote:
> On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
> > On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
> > > There is a scenario where smaller mask would work on arm64. For example
> > > Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
> > > 0x8000). A device with 31-bit mask and a dma_pfn_offset of
> > > 0x8000 would still work (there isn't any but just as an example). So
> > > the check in dma_alloc_coherent() would be something like:
> > > 
> > >   phys_to_dma(top of ZONE_DMA) - dma_pfn_offset <= coherent_dma_mask
> > > 
> > > (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
> > > 
> > > If the condition above fails, dma_alloc_coherent() would no longer fall
> > > back to swiotlb but issue a dev_warn() and return NULL.
> > 
> > Ah, that looks like it should work on all architectures, very nice.
> > How about checking this condition, and then printing a small warning
> > (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?
> 
> I would not add the above ZONE_DMA check to of_dma_configure(). For
> example on arm64, we may not support a small coherent_dma_mask but the
> same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
> However, that's an arch-specific decision. Maybe after the above setting
> of dev->coherent_dma_mask in of_dma_configure(), we could add:
> 
>   if (!dma_supported(dev, dev->coherent_dma_mask))
>   dev->dma_mask = NULL;
> 
> The arch dma_supported() can check the swiotlb bouncing or ZONE_DMA
> limits.

More precisely, that's what I meant:

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..4fcdfed6a6df 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -172,13 +172,6 @@ static void of_dma_configure(struct device *dev)
dev->coherent_dma_mask = DMA_BIT_MASK(32);
 
/*
-* Set it to coherent_dma_mask by default if the architecture
-* code has not set it.
-*/
-   if (!dev->dma_mask)
-   dev->dma_mask = >coherent_dma_mask;
-
-   /*
 * if dma-coherent property exist, call arch hook to setup
 * dma coherent operations.
 */
@@ -188,18 +181,32 @@ static void of_dma_configure(struct device *dev)
}
 
/*
-* if dma-ranges property doesn't exist - just return else
-* setup the dma offset
+* If dma-ranges property exists - setup the dma offset.
 */
ret = of_dma_get_range(dev->of_node, _addr, , );
if (ret < 0) {
dev_dbg(dev, "no dma range information to setup\n");
-   return;
+   size = 0;
+   } else {
+   /* DMA ranges found. Calculate and set dma_pfn_offset */
+   dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
+   dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
}
 
-   /* DMA ranges found. Calculate and set dma_pfn_offset */
-   dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
-   dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
+   /*
+* If the bus dma-ranges property specifies a size smaller than 4GB,
+* the device would not be capable of accessing the whole 32-bit
+* space, so reduce the default coherent_dma_mask accordingly.
+*/
+   if (size && size < (1ULL << 32))
+   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
+
+   /*
+* Set dma_mask to coherent_dma_mask by default if the architecture
+* code has not set it and DMA on such mask is supported.
+*/
+   if (!dev->dma_mask && dma_supported(dev, dev->coherent_dma_mask))
+   dev->dma_mask = >coherent_dma_mask;
 }
 
 /**

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
> On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
> > There is a scenario where smaller mask would work on arm64. For example
> > Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
> > 0x8000). A device with 31-bit mask and a dma_pfn_offset of
> > 0x8000 would still work (there isn't any but just as an example). So
> > the check in dma_alloc_coherent() would be something like:
> > 
> > phys_to_dma(top of ZONE_DMA) - dma_pfn_offset <= coherent_dma_mask
> > 
> > (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
> > 
> > If the condition above fails, dma_alloc_coherent() would no longer fall
> > back to swiotlb but issue a dev_warn() and return NULL.
> 
> Ah, that looks like it should work on all architectures, very nice.
> How about checking this condition, and then printing a small warning
> (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?

I would not add the above ZONE_DMA check to of_dma_configure(). For
example on arm64, we may not support a small coherent_dma_mask but the
same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
However, that's an arch-specific decision. Maybe after the above setting
of dev->coherent_dma_mask in of_dma_configure(), we could add:

if (!dma_supported(dev, dev->coherent_dma_mask))
dev->dma_mask = NULL;

The arch dma_supported() can check the swiotlb bouncing or ZONE_DMA
limits.

Strangely, we don't have a coherent_dma_supported() but we can defer
such check to dma_alloc_coherent() and that's where we would check the
top of ZONE_DMA.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
> On Fri, Nov 21, 2014 at 12:48:09PM +, Arnd Bergmann wrote:
> > On Friday 21 November 2014 09:35:10 Catalin Marinas wrote:
> > > +static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
> > > +{
> > > + if (!dma_supported(dev, mask))
> > > + return -EIO;
> > > + if (mask > dev->coherent_dma_mask)
> > > + mask &= of_dma_get_range_mask(dev);
> > > + dev->coherent_dma_mask = mask;
> > > + return 0;
> > > +}
> > 
> > There is an interesting side problem here: the dma mask points to
> > coherent_dma_mask for all devices probed from DT, so this breaks
> > if we have any driver that sets them to different values. It is a
> > preexisting problem them.
> 
> Such drivers would have to set both masks separately (or call
> dma_set_mask_and_coherent). What we assume though is that dma-ranges
> refers to both dma_mask and coherent_dma_mask. I don't think that would
> be a problem for ARM systems.

Right, I'm just saying that we don't have a way to detect drivers that
break this assumption, not that we have a serious problem already.

> > >  EXPORT_SYMBOL_GPL(of_dma_get_range);
> > >  
> > > +u64 of_dma_get_range_mask(struct device *dev)
> > > +{
> > > + u64 dma_addr, paddr, size;
> > > +
> > > + /* no dma mask limiting if no of_node or no dma-ranges property */
> > > + if (!dev->of_node ||
> > > + of_dma_get_range(dev->of_node, _addr, , ) < 0)
> > > + return DMA_BIT_MASK(64);
> > 
> > If no dma-ranges are present, we should assume that the bus only supports
> > 32-bit DMA, or we could make it architecture specific. It would probably
> > be best for arm64 to require a dma-ranges property for doing any DMA
> > at all, but we can't do that on arm32 any more now.
> 
> I thought about this but it could break some existing arm64 DT files if
> we mandate dma-ranges property (we could try though). The mask limiting
> is arch-specific anyway.

Yes, this has taken far too long to get addressed, we should have added
the properties right from the start. If we have a function in architecture
specific code, maybe we can just check for the short list of already
supported platforms that need backwards compatibility and require the
mask for everything else?

> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index 3b64d0bf5bba..50d1ac4739e6 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> > >   /* DMA ranges found. Calculate and set dma_pfn_offset */
> > >   dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> > >   dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> > > +
> > > + /* limit the coherent_dma_mask to the dma-ranges size property */
> > 
> > I would change the comment to clarify that we are actually changing
> > the dma_mask here as well.
> > 
> > > + if (size < (1ULL << 32))
> > > + dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> > >  }
> > 
> > As you mentioned in another mail in this thread, we wouldn't be
> > able to suuport this case on arm64.
> 
> The mask would still be valid and even usable if an IOMMU is present. Do
> you mean we should not limit it at all here?

The code is certainly correct on arm32, as long as we have an appropriate
DMA zone.

> There is a scenario where smaller mask would work on arm64. For example
> Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
> 0x8000). A device with 31-bit mask and a dma_pfn_offset of
> 0x8000 would still work (there isn't any but just as an example). So
> the check in dma_alloc_coherent() would be something like:
> 
>   phys_to_dma(top of ZONE_DMA) - dma_pfn_offset <= coherent_dma_mask
> 
> (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
> 
> If the condition above fails, dma_alloc_coherent() would no longer fall
> back to swiotlb but issue a dev_warn() and return NULL.

Ah, that looks like it should work on all architectures, very nice.
How about checking this condition, and then printing a small warning
(dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 12:48:09PM +, Arnd Bergmann wrote:
> On Friday 21 November 2014 09:35:10 Catalin Marinas wrote:
> > +static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
> > +{
> > +   if (!dma_supported(dev, mask))
> > +   return -EIO;
> > +   if (mask > dev->coherent_dma_mask)
> > +   mask &= of_dma_get_range_mask(dev);
> > +   dev->coherent_dma_mask = mask;
> > +   return 0;
> > +}
> 
> There is an interesting side problem here: the dma mask points to
> coherent_dma_mask for all devices probed from DT, so this breaks
> if we have any driver that sets them to different values. It is a
> preexisting problem them.

Such drivers would have to set both masks separately (or call
dma_set_mask_and_coherent). What we assume though is that dma-ranges
refers to both dma_mask and coherent_dma_mask. I don't think that would
be a problem for ARM systems.

> >  EXPORT_SYMBOL_GPL(of_dma_get_range);
> >  
> > +u64 of_dma_get_range_mask(struct device *dev)
> > +{
> > +   u64 dma_addr, paddr, size;
> > +
> > +   /* no dma mask limiting if no of_node or no dma-ranges property */
> > +   if (!dev->of_node ||
> > +   of_dma_get_range(dev->of_node, _addr, , ) < 0)
> > +   return DMA_BIT_MASK(64);
> 
> If no dma-ranges are present, we should assume that the bus only supports
> 32-bit DMA, or we could make it architecture specific. It would probably
> be best for arm64 to require a dma-ranges property for doing any DMA
> at all, but we can't do that on arm32 any more now.

I thought about this but it could break some existing arm64 DT files if
we mandate dma-ranges property (we could try though). The mask limiting
is arch-specific anyway.

> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 3b64d0bf5bba..50d1ac4739e6 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> > /* DMA ranges found. Calculate and set dma_pfn_offset */
> > dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> > dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> > +
> > +   /* limit the coherent_dma_mask to the dma-ranges size property */
> 
> I would change the comment to clarify that we are actually changing
> the dma_mask here as well.
> 
> > +   if (size < (1ULL << 32))
> > +   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> >  }
> 
> As you mentioned in another mail in this thread, we wouldn't be
> able to suuport this case on arm64.

The mask would still be valid and even usable if an IOMMU is present. Do
you mean we should not limit it at all here?

There is a scenario where smaller mask would work on arm64. For example
Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
0x8000). A device with 31-bit mask and a dma_pfn_offset of
0x8000 would still work (there isn't any but just as an example). So
the check in dma_alloc_coherent() would be something like:

phys_to_dma(top of ZONE_DMA) - dma_pfn_offset <= coherent_dma_mask

(or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)

If the condition above fails, dma_alloc_coherent() would no longer fall
back to swiotlb but issue a dev_warn() and return NULL.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 09:35:10 Catalin Marinas wrote:
> On Thu, Nov 20, 2014 at 07:40:00AM +, Arnd Bergmann wrote:
> > On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
> 
> But this wouldn't help Ding's case, here the driver needs to set the
> wider DMA mask.
> 
> Anyway, back to your point, to make sure I understand what you meant (I
> can send a proper patch with log afterwards):

Thanks for putting this into code!

> @@ -88,11 +89,24 @@ static inline int dma_set_mask(struct device *dev, u64 
> mask)
>  {
>   if (!dev->dma_mask || !dma_supported(dev, mask))
>   return -EIO;
> + /* if asking for bigger dma mask, limit it to the bus dma ranges */
> + if (mask > *dev->dma_mask)
> + mask &= of_dma_get_range_mask(dev);
>   *dev->dma_mask = mask;
>  
>   return 0;
>  }

As you commented later, the dma_supported check indeed needs to happen
after the masking.

> +static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
> +{
> + if (!dma_supported(dev, mask))
> + return -EIO;
> + if (mask > dev->coherent_dma_mask)
> + mask &= of_dma_get_range_mask(dev);
> + dev->coherent_dma_mask = mask;
> + return 0;
> +}

There is an interesting side problem here: the dma mask points to
coherent_dma_mask for all devices probed from DT, so this breaks
if we have any driver that sets them to different values. It is a
preexisting problem them.

>  EXPORT_SYMBOL_GPL(of_dma_get_range);
>  
> +u64 of_dma_get_range_mask(struct device *dev)
> +{
> + u64 dma_addr, paddr, size;
> +
> + /* no dma mask limiting if no of_node or no dma-ranges property */
> + if (!dev->of_node ||
> + of_dma_get_range(dev->of_node, _addr, , ) < 0)
> + return DMA_BIT_MASK(64);

If no dma-ranges are present, we should assume that the bus only supports
32-bit DMA, or we could make it architecture specific. It would probably
be best for arm64 to require a dma-ranges property for doing any DMA
at all, but we can't do that on arm32 any more now.

> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 3b64d0bf5bba..50d1ac4739e6 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
>   /* DMA ranges found. Calculate and set dma_pfn_offset */
>   dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
>   dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> +
> + /* limit the coherent_dma_mask to the dma-ranges size property */

I would change the comment to clarify that we are actually changing
the dma_mask here as well.

> + if (size < (1ULL << 32))
> + dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
>  }
>  


As you mentioned in another mail in this thread, we wouldn't be
able to suuport this case on arm64.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 11:36:18 Catalin Marinas wrote:
> On Fri, Nov 21, 2014 at 11:26:45AM +, Arnd Bergmann wrote:
> > On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
> > > On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
> > > > On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > > > > Going back to original topic, the dma_supported() function on arm64
> > > > > calls swiotlb_dma_supported() which actually checks whether the 
> > > > > swiotlb
> > > > > bounce buffer is within the dma mask. This transparent bouncing 
> > > > > (unlike
> > > > > arm32 where it needs to be explicit) is not always optimal, though
> > > > > required for 32-bit only devices on a 64-bit system. The problem is 
> > > > > when
> > > > > the driver is 64-bit capable but forgets to call
> > > > > dma_set_mask_and_coherent() (that's not the only question I got about
> > > > > running out of swiotlb buffers).
> > > > 
> > > > I think it would be nice to warn once per device that starts using the
> > > > swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> > > > attached.
> > > 
> > > It would be nice to have a dev_warn_once().
> > > 
> > > I think it makes sense on arm64 to avoid swiotlb bounce buffers for
> > > coherent allocations altogether. The __dma_alloc_coherent() function
> > > already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
> > > have a device that cannot even cope with a 32-bit ZONE_DMA, we should
> > > just not support DMA at all on it (without an IOMMU). The arm32
> > > __dma_supported() has a similar check.
> > 
> > If we ever encounter this case, we may have to add a smaller ZONE_DMA
> > and use ZONE_DMA32 for the normal dma allocations.
> 
> Traditionally on x86 I think ZONE_DMA was for ISA and ZONE_DMA32 had to
> cover the 32-bit physical address space. On arm64 we don't expect ISA,
> so we only use ZONE_DMA (which is 4G, similar to IA-64, sparc). We had
> ZONE_DMA32 originally but it broke swiotlb which assumes ZONE_DMA for
> its bounce buffer.

Right, I'm just saying that we might encounter some broken hardware
that needs e.g. a 31-bit dma mask for one device, and we decide that
this chip is important enough that we have to support it.

We can of course hope that this won't happen.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 11:26:45AM +, Arnd Bergmann wrote:
> On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
> > On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
> > > On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > > > Going back to original topic, the dma_supported() function on arm64
> > > > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > > > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > > > arm32 where it needs to be explicit) is not always optimal, though
> > > > required for 32-bit only devices on a 64-bit system. The problem is when
> > > > the driver is 64-bit capable but forgets to call
> > > > dma_set_mask_and_coherent() (that's not the only question I got about
> > > > running out of swiotlb buffers).
> > > 
> > > I think it would be nice to warn once per device that starts using the
> > > swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> > > attached.
> > 
> > It would be nice to have a dev_warn_once().
> > 
> > I think it makes sense on arm64 to avoid swiotlb bounce buffers for
> > coherent allocations altogether. The __dma_alloc_coherent() function
> > already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
> > have a device that cannot even cope with a 32-bit ZONE_DMA, we should
> > just not support DMA at all on it (without an IOMMU). The arm32
> > __dma_supported() has a similar check.
> 
> If we ever encounter this case, we may have to add a smaller ZONE_DMA
> and use ZONE_DMA32 for the normal dma allocations.

Traditionally on x86 I think ZONE_DMA was for ISA and ZONE_DMA32 had to
cover the 32-bit physical address space. On arm64 we don't expect ISA,
so we only use ZONE_DMA (which is 4G, similar to IA-64, sparc). We had
ZONE_DMA32 originally but it broke swiotlb which assumes ZONE_DMA for
its bounce buffer.

> > Swiotlb is still required for the streaming DMA since we get bouncing
> > for pages allocated outside the driver control (e.g. VFS layer which
> > doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.
> > 
> > Ding seems to imply that CMA fixes the problem, which means that the
> > issue is indeed coherent allocations.
> 
> I wonder what's going on here, since swiotlb_alloc_coherent() actually
> tries a regular __get_free_pages(flags, order) first, and when ZONE_DMA
> is set here, it just work without using the pool.

As long as coherent_dma_mask is sufficient for ZONE_DMA. I have no idea
what this mask is set to in Ding's case (but I've seen the problem
previously with an out of tree driver where coherent_dma_mask was some
random number; so better reporting here would help).

There could be another case where dma_pfn_offset is required but let's
wait for some more info from Ding (ZONE_DMA is 32-bit from the start of
RAM which could be 40-bit like on Seattle, so basically such devices
would need to set dma_pfn_offset).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
> On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
> > On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > > Going back to original topic, the dma_supported() function on arm64
> > > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > > arm32 where it needs to be explicit) is not always optimal, though
> > > required for 32-bit only devices on a 64-bit system. The problem is when
> > > the driver is 64-bit capable but forgets to call
> > > dma_set_mask_and_coherent() (that's not the only question I got about
> > > running out of swiotlb buffers).
> > 
> > I think it would be nice to warn once per device that starts using the
> > swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> > attached.
> 
> It would be nice to have a dev_warn_once().
> 
> I think it makes sense on arm64 to avoid swiotlb bounce buffers for
> coherent allocations altogether. The __dma_alloc_coherent() function
> already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
> have a device that cannot even cope with a 32-bit ZONE_DMA, we should
> just not support DMA at all on it (without an IOMMU). The arm32
> __dma_supported() has a similar check.

If we ever encounter this case, we may have to add a smaller ZONE_DMA
and use ZONE_DMA32 for the normal dma allocations.

> Swiotlb is still required for the streaming DMA since we get bouncing
> for pages allocated outside the driver control (e.g. VFS layer which
> doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.
> 
> Ding seems to imply that CMA fixes the problem, which means that the
> issue is indeed coherent allocations.

I wonder what's going on here, since swiotlb_alloc_coherent() actually
tries a regular __get_free_pages(flags, order) first, and when ZONE_DMA
is set here, it just work without using the pool.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > Going back to original topic, the dma_supported() function on arm64
> > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > arm32 where it needs to be explicit) is not always optimal, though
> > required for 32-bit only devices on a 64-bit system. The problem is when
> > the driver is 64-bit capable but forgets to call
> > dma_set_mask_and_coherent() (that's not the only question I got about
> > running out of swiotlb buffers).
> 
> I think it would be nice to warn once per device that starts using the
> swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> attached.

It would be nice to have a dev_warn_once().

I think it makes sense on arm64 to avoid swiotlb bounce buffers for
coherent allocations altogether. The __dma_alloc_coherent() function
already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
have a device that cannot even cope with a 32-bit ZONE_DMA, we should
just not support DMA at all on it (without an IOMMU). The arm32
__dma_supported() has a similar check.

Swiotlb is still required for the streaming DMA since we get bouncing
for pages allocated outside the driver control (e.g. VFS layer which
doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.

Ding seems to imply that CMA fixes the problem, which means that the
issue is indeed coherent allocations.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 09:35:10AM +, Catalin Marinas wrote:
> @@ -88,11 +89,24 @@ static inline int dma_set_mask(struct device *dev, u64 
> mask)
>  {
>   if (!dev->dma_mask || !dma_supported(dev, mask))
>   return -EIO;
> + /* if asking for bigger dma mask, limit it to the bus dma ranges */
> + if (mask > *dev->dma_mask)
> + mask &= of_dma_get_range_mask(dev);
>   *dev->dma_mask = mask;
>  
>   return 0;
>  }

I wonder whether mask &= of_dma_get_range_mask(dev) limiting should
actually be done before actually checking dma_supported(). That's
because a device may try to set a 64-bit mask but being connected to a
more limiting bus, we should check the dma_supported() on the resulting
mask.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Thu, Nov 20, 2014 at 07:40:00AM +, Arnd Bergmann wrote:
> On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
> > On 2014/11/19 16:45, Arnd Bergmann wrote:
> > > On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
> > >> On 2014/11/18 2:09, Catalin Marinas wrote:
> > >>> On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
> > >> Thanks everyone, I think I found the way to fix it, need to enable 
> > >> DMA_CMA, to reserve a big memory
> > >> for CMA and set coherent mask for dev, then dma_alloc and dma_mapping 
> > >> will not use the swiotlb until
> > >> the memory out of mask or swiotlb_force is enabled.
> > >>
> > >> If I still understand uncorrectly, please inform me.
> > > 
> > > Please do not use CMA to work around the problem, but fix the underlying 
> > > bug
> > > instead.
> > > 
> > > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > > dma mask, and check whether that succeeded. However, the code implementing
> > > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > > the dma-ranges property (see of_dma_configure()), and check if the mask
> > > is possible.
> > > 
> > The dma_pfn_offset looks only support arm32, but my platform is
> > aarch64 and I check the latest kernel version, I think the dma-rangs
> > still could not work for aarch64, so maybe we should add
> > dma_pfn_offset for aarch64 first.
> 
> I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
> code currently doesn't look at the mask. As I explained in my reply to
> Catalin, it should set the mask to the size of the dma-ranges if that is
> 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
> property to decide what to set the mask to when a driver asks for a
> mask larger than 64-bit.

But this wouldn't help Ding's case, here the driver needs to set the
wider DMA mask.

Anyway, back to your point, to make sure I understand what you meant (I
can send a proper patch with log afterwards):

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1e0e4671dd25..d6a4b4619174 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,6 +120,9 @@ config HAVE_GENERIC_RCU_GUP
 config ARCH_DMA_ADDR_T_64BIT
def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+   def_bool y
+
 config NEED_DMA_MAP_STATE
def_bool y
 
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index adeae3f6f0fc..92dcd251e549 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -18,6 +18,7 @@
 
 #ifdef __KERNEL__
 
+#include 
 #include 
 #include 
 
@@ -88,11 +89,24 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 {
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
+   /* if asking for bigger dma mask, limit it to the bus dma ranges */
+   if (mask > *dev->dma_mask)
+   mask &= of_dma_get_range_mask(dev);
*dev->dma_mask = mask;
 
return 0;
 }
 
+static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+   if (!dma_supported(dev, mask))
+   return -EIO;
+   if (mask > dev->coherent_dma_mask)
+   mask &= of_dma_get_range_mask(dev);
+   dev->coherent_dma_mask = mask;
+   return 0;
+}
+
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
 {
if (!dev->dma_mask)
diff --git a/drivers/of/address.c b/drivers/of/address.c
index afdb78299f61..89c04abdf9bb 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1,5 +1,6 @@
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -979,6 +980,19 @@ out:
 }
 EXPORT_SYMBOL_GPL(of_dma_get_range);
 
+u64 of_dma_get_range_mask(struct device *dev)
+{
+   u64 dma_addr, paddr, size;
+
+   /* no dma mask limiting if no of_node or no dma-ranges property */
+   if (!dev->of_node ||
+   of_dma_get_range(dev->of_node, _addr, , ) < 0)
+   return DMA_BIT_MASK(64);
+
+   return DMA_BIT_MASK(ilog2(size));
+}
+EXPORT_SYMBOL_GPL(of_dma_get_range_mask);
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..50d1ac4739e6 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
/* DMA ranges found. Calculate and set dma_pfn_offset */
dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
+
+   /* limit the coherent_dma_mask to the dma-ranges size property */
+   if (size < (1ULL << 32))
+   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
 }
 
 /**
diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index 8cb14eb393d6..fffb1a49a1a7 100644
--- a/include/linux/of_address.h
+++ 

Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Thu, Nov 20, 2014 at 07:40:00AM +, Arnd Bergmann wrote:
 On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
  On 2014/11/19 16:45, Arnd Bergmann wrote:
   On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
   On 2014/11/18 2:09, Catalin Marinas wrote:
   On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
   Thanks everyone, I think I found the way to fix it, need to enable 
   DMA_CMA, to reserve a big memory
   for CMA and set coherent mask for dev, then dma_alloc and dma_mapping 
   will not use the swiotlb until
   the memory out of mask or swiotlb_force is enabled.
  
   If I still understand uncorrectly, please inform me.
   
   Please do not use CMA to work around the problem, but fix the underlying 
   bug
   instead.
   
   The driver should call 'dma_set_mask_and_coherent()' with the appropriate
   dma mask, and check whether that succeeded. However, the code implementing
   dma_set_mask_and_coherent on arm64 also needs to be changed to look up
   the dma-ranges property (see of_dma_configure()), and check if the mask
   is possible.
   
  The dma_pfn_offset looks only support arm32, but my platform is
  aarch64 and I check the latest kernel version, I think the dma-rangs
  still could not work for aarch64, so maybe we should add
  dma_pfn_offset for aarch64 first.
 
 I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
 code currently doesn't look at the mask. As I explained in my reply to
 Catalin, it should set the mask to the size of the dma-ranges if that is
 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
 property to decide what to set the mask to when a driver asks for a
 mask larger than 64-bit.

But this wouldn't help Ding's case, here the driver needs to set the
wider DMA mask.

Anyway, back to your point, to make sure I understand what you meant (I
can send a proper patch with log afterwards):

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1e0e4671dd25..d6a4b4619174 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,6 +120,9 @@ config HAVE_GENERIC_RCU_GUP
 config ARCH_DMA_ADDR_T_64BIT
def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+   def_bool y
+
 config NEED_DMA_MAP_STATE
def_bool y
 
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index adeae3f6f0fc..92dcd251e549 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -18,6 +18,7 @@
 
 #ifdef __KERNEL__
 
+#include linux/of_address.h
 #include linux/types.h
 #include linux/vmalloc.h
 
@@ -88,11 +89,24 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 {
if (!dev-dma_mask || !dma_supported(dev, mask))
return -EIO;
+   /* if asking for bigger dma mask, limit it to the bus dma ranges */
+   if (mask  *dev-dma_mask)
+   mask = of_dma_get_range_mask(dev);
*dev-dma_mask = mask;
 
return 0;
 }
 
+static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+   if (!dma_supported(dev, mask))
+   return -EIO;
+   if (mask  dev-coherent_dma_mask)
+   mask = of_dma_get_range_mask(dev);
+   dev-coherent_dma_mask = mask;
+   return 0;
+}
+
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
 {
if (!dev-dma_mask)
diff --git a/drivers/of/address.c b/drivers/of/address.c
index afdb78299f61..89c04abdf9bb 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1,5 +1,6 @@
 
 #include linux/device.h
+#include linux/dma-mapping.h
 #include linux/io.h
 #include linux/ioport.h
 #include linux/module.h
@@ -979,6 +980,19 @@ out:
 }
 EXPORT_SYMBOL_GPL(of_dma_get_range);
 
+u64 of_dma_get_range_mask(struct device *dev)
+{
+   u64 dma_addr, paddr, size;
+
+   /* no dma mask limiting if no of_node or no dma-ranges property */
+   if (!dev-of_node ||
+   of_dma_get_range(dev-of_node, dma_addr, paddr, size)  0)
+   return DMA_BIT_MASK(64);
+
+   return DMA_BIT_MASK(ilog2(size));
+}
+EXPORT_SYMBOL_GPL(of_dma_get_range_mask);
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..50d1ac4739e6 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
/* DMA ranges found. Calculate and set dma_pfn_offset */
dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
+
+   /* limit the coherent_dma_mask to the dma-ranges size property */
+   if (size  (1ULL  32))
+   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
 }
 
 /**
diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index 8cb14eb393d6..fffb1a49a1a7 100644
--- a/include/linux/of_address.h
+++ 

Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 09:35:10AM +, Catalin Marinas wrote:
 @@ -88,11 +89,24 @@ static inline int dma_set_mask(struct device *dev, u64 
 mask)
  {
   if (!dev-dma_mask || !dma_supported(dev, mask))
   return -EIO;
 + /* if asking for bigger dma mask, limit it to the bus dma ranges */
 + if (mask  *dev-dma_mask)
 + mask = of_dma_get_range_mask(dev);
   *dev-dma_mask = mask;
  
   return 0;
  }

I wonder whether mask = of_dma_get_range_mask(dev) limiting should
actually be done before actually checking dma_supported(). That's
because a device may try to set a 64-bit mask but being connected to a
more limiting bus, we should check the dma_supported() on the resulting
mask.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
 On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
  Going back to original topic, the dma_supported() function on arm64
  calls swiotlb_dma_supported() which actually checks whether the swiotlb
  bounce buffer is within the dma mask. This transparent bouncing (unlike
  arm32 where it needs to be explicit) is not always optimal, though
  required for 32-bit only devices on a 64-bit system. The problem is when
  the driver is 64-bit capable but forgets to call
  dma_set_mask_and_coherent() (that's not the only question I got about
  running out of swiotlb buffers).
 
 I think it would be nice to warn once per device that starts using the
 swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
 attached.

It would be nice to have a dev_warn_once().

I think it makes sense on arm64 to avoid swiotlb bounce buffers for
coherent allocations altogether. The __dma_alloc_coherent() function
already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
have a device that cannot even cope with a 32-bit ZONE_DMA, we should
just not support DMA at all on it (without an IOMMU). The arm32
__dma_supported() has a similar check.

Swiotlb is still required for the streaming DMA since we get bouncing
for pages allocated outside the driver control (e.g. VFS layer which
doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.

Ding seems to imply that CMA fixes the problem, which means that the
issue is indeed coherent allocations.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
 On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
  On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
   Going back to original topic, the dma_supported() function on arm64
   calls swiotlb_dma_supported() which actually checks whether the swiotlb
   bounce buffer is within the dma mask. This transparent bouncing (unlike
   arm32 where it needs to be explicit) is not always optimal, though
   required for 32-bit only devices on a 64-bit system. The problem is when
   the driver is 64-bit capable but forgets to call
   dma_set_mask_and_coherent() (that's not the only question I got about
   running out of swiotlb buffers).
  
  I think it would be nice to warn once per device that starts using the
  swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
  attached.
 
 It would be nice to have a dev_warn_once().
 
 I think it makes sense on arm64 to avoid swiotlb bounce buffers for
 coherent allocations altogether. The __dma_alloc_coherent() function
 already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
 have a device that cannot even cope with a 32-bit ZONE_DMA, we should
 just not support DMA at all on it (without an IOMMU). The arm32
 __dma_supported() has a similar check.

If we ever encounter this case, we may have to add a smaller ZONE_DMA
and use ZONE_DMA32 for the normal dma allocations.

 Swiotlb is still required for the streaming DMA since we get bouncing
 for pages allocated outside the driver control (e.g. VFS layer which
 doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.
 
 Ding seems to imply that CMA fixes the problem, which means that the
 issue is indeed coherent allocations.

I wonder what's going on here, since swiotlb_alloc_coherent() actually
tries a regular __get_free_pages(flags, order) first, and when ZONE_DMA
is set here, it just work without using the pool.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 11:26:45AM +, Arnd Bergmann wrote:
 On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
  On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
   On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
Going back to original topic, the dma_supported() function on arm64
calls swiotlb_dma_supported() which actually checks whether the swiotlb
bounce buffer is within the dma mask. This transparent bouncing (unlike
arm32 where it needs to be explicit) is not always optimal, though
required for 32-bit only devices on a 64-bit system. The problem is when
the driver is 64-bit capable but forgets to call
dma_set_mask_and_coherent() (that's not the only question I got about
running out of swiotlb buffers).
   
   I think it would be nice to warn once per device that starts using the
   swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
   attached.
  
  It would be nice to have a dev_warn_once().
  
  I think it makes sense on arm64 to avoid swiotlb bounce buffers for
  coherent allocations altogether. The __dma_alloc_coherent() function
  already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
  have a device that cannot even cope with a 32-bit ZONE_DMA, we should
  just not support DMA at all on it (without an IOMMU). The arm32
  __dma_supported() has a similar check.
 
 If we ever encounter this case, we may have to add a smaller ZONE_DMA
 and use ZONE_DMA32 for the normal dma allocations.

Traditionally on x86 I think ZONE_DMA was for ISA and ZONE_DMA32 had to
cover the 32-bit physical address space. On arm64 we don't expect ISA,
so we only use ZONE_DMA (which is 4G, similar to IA-64, sparc). We had
ZONE_DMA32 originally but it broke swiotlb which assumes ZONE_DMA for
its bounce buffer.

  Swiotlb is still required for the streaming DMA since we get bouncing
  for pages allocated outside the driver control (e.g. VFS layer which
  doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.
  
  Ding seems to imply that CMA fixes the problem, which means that the
  issue is indeed coherent allocations.
 
 I wonder what's going on here, since swiotlb_alloc_coherent() actually
 tries a regular __get_free_pages(flags, order) first, and when ZONE_DMA
 is set here, it just work without using the pool.

As long as coherent_dma_mask is sufficient for ZONE_DMA. I have no idea
what this mask is set to in Ding's case (but I've seen the problem
previously with an out of tree driver where coherent_dma_mask was some
random number; so better reporting here would help).

There could be another case where dma_pfn_offset is required but let's
wait for some more info from Ding (ZONE_DMA is 32-bit from the start of
RAM which could be 40-bit like on Seattle, so basically such devices
would need to set dma_pfn_offset).

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 11:36:18 Catalin Marinas wrote:
 On Fri, Nov 21, 2014 at 11:26:45AM +, Arnd Bergmann wrote:
  On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
   On Wed, Nov 19, 2014 at 03:56:42PM +, Arnd Bergmann wrote:
On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
 Going back to original topic, the dma_supported() function on arm64
 calls swiotlb_dma_supported() which actually checks whether the 
 swiotlb
 bounce buffer is within the dma mask. This transparent bouncing 
 (unlike
 arm32 where it needs to be explicit) is not always optimal, though
 required for 32-bit only devices on a 64-bit system. The problem is 
 when
 the driver is 64-bit capable but forgets to call
 dma_set_mask_and_coherent() (that's not the only question I got about
 running out of swiotlb buffers).

I think it would be nice to warn once per device that starts using the
swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
attached.
   
   It would be nice to have a dev_warn_once().
   
   I think it makes sense on arm64 to avoid swiotlb bounce buffers for
   coherent allocations altogether. The __dma_alloc_coherent() function
   already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
   have a device that cannot even cope with a 32-bit ZONE_DMA, we should
   just not support DMA at all on it (without an IOMMU). The arm32
   __dma_supported() has a similar check.
  
  If we ever encounter this case, we may have to add a smaller ZONE_DMA
  and use ZONE_DMA32 for the normal dma allocations.
 
 Traditionally on x86 I think ZONE_DMA was for ISA and ZONE_DMA32 had to
 cover the 32-bit physical address space. On arm64 we don't expect ISA,
 so we only use ZONE_DMA (which is 4G, similar to IA-64, sparc). We had
 ZONE_DMA32 originally but it broke swiotlb which assumes ZONE_DMA for
 its bounce buffer.

Right, I'm just saying that we might encounter some broken hardware
that needs e.g. a 31-bit dma mask for one device, and we decide that
this chip is important enough that we have to support it.

We can of course hope that this won't happen.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 09:35:10 Catalin Marinas wrote:
 On Thu, Nov 20, 2014 at 07:40:00AM +, Arnd Bergmann wrote:
  On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
 
 But this wouldn't help Ding's case, here the driver needs to set the
 wider DMA mask.
 
 Anyway, back to your point, to make sure I understand what you meant (I
 can send a proper patch with log afterwards):

Thanks for putting this into code!

 @@ -88,11 +89,24 @@ static inline int dma_set_mask(struct device *dev, u64 
 mask)
  {
   if (!dev-dma_mask || !dma_supported(dev, mask))
   return -EIO;
 + /* if asking for bigger dma mask, limit it to the bus dma ranges */
 + if (mask  *dev-dma_mask)
 + mask = of_dma_get_range_mask(dev);
   *dev-dma_mask = mask;
  
   return 0;
  }

As you commented later, the dma_supported check indeed needs to happen
after the masking.

 +static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
 +{
 + if (!dma_supported(dev, mask))
 + return -EIO;
 + if (mask  dev-coherent_dma_mask)
 + mask = of_dma_get_range_mask(dev);
 + dev-coherent_dma_mask = mask;
 + return 0;
 +}

There is an interesting side problem here: the dma mask points to
coherent_dma_mask for all devices probed from DT, so this breaks
if we have any driver that sets them to different values. It is a
preexisting problem them.

  EXPORT_SYMBOL_GPL(of_dma_get_range);
  
 +u64 of_dma_get_range_mask(struct device *dev)
 +{
 + u64 dma_addr, paddr, size;
 +
 + /* no dma mask limiting if no of_node or no dma-ranges property */
 + if (!dev-of_node ||
 + of_dma_get_range(dev-of_node, dma_addr, paddr, size)  0)
 + return DMA_BIT_MASK(64);

If no dma-ranges are present, we should assume that the bus only supports
32-bit DMA, or we could make it architecture specific. It would probably
be best for arm64 to require a dma-ranges property for doing any DMA
at all, but we can't do that on arm32 any more now.

 diff --git a/drivers/of/platform.c b/drivers/of/platform.c
 index 3b64d0bf5bba..50d1ac4739e6 100644
 --- a/drivers/of/platform.c
 +++ b/drivers/of/platform.c
 @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
   /* DMA ranges found. Calculate and set dma_pfn_offset */
   dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
   dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
 +
 + /* limit the coherent_dma_mask to the dma-ranges size property */

I would change the comment to clarify that we are actually changing
the dma_mask here as well.

 + if (size  (1ULL  32))
 + dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
  }
  


As you mentioned in another mail in this thread, we wouldn't be
able to suuport this case on arm64.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 12:48:09PM +, Arnd Bergmann wrote:
 On Friday 21 November 2014 09:35:10 Catalin Marinas wrote:
  +static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
  +{
  +   if (!dma_supported(dev, mask))
  +   return -EIO;
  +   if (mask  dev-coherent_dma_mask)
  +   mask = of_dma_get_range_mask(dev);
  +   dev-coherent_dma_mask = mask;
  +   return 0;
  +}
 
 There is an interesting side problem here: the dma mask points to
 coherent_dma_mask for all devices probed from DT, so this breaks
 if we have any driver that sets them to different values. It is a
 preexisting problem them.

Such drivers would have to set both masks separately (or call
dma_set_mask_and_coherent). What we assume though is that dma-ranges
refers to both dma_mask and coherent_dma_mask. I don't think that would
be a problem for ARM systems.

   EXPORT_SYMBOL_GPL(of_dma_get_range);
   
  +u64 of_dma_get_range_mask(struct device *dev)
  +{
  +   u64 dma_addr, paddr, size;
  +
  +   /* no dma mask limiting if no of_node or no dma-ranges property */
  +   if (!dev-of_node ||
  +   of_dma_get_range(dev-of_node, dma_addr, paddr, size)  0)
  +   return DMA_BIT_MASK(64);
 
 If no dma-ranges are present, we should assume that the bus only supports
 32-bit DMA, or we could make it architecture specific. It would probably
 be best for arm64 to require a dma-ranges property for doing any DMA
 at all, but we can't do that on arm32 any more now.

I thought about this but it could break some existing arm64 DT files if
we mandate dma-ranges property (we could try though). The mask limiting
is arch-specific anyway.

  diff --git a/drivers/of/platform.c b/drivers/of/platform.c
  index 3b64d0bf5bba..50d1ac4739e6 100644
  --- a/drivers/of/platform.c
  +++ b/drivers/of/platform.c
  @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
  /* DMA ranges found. Calculate and set dma_pfn_offset */
  dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
  dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
  +
  +   /* limit the coherent_dma_mask to the dma-ranges size property */
 
 I would change the comment to clarify that we are actually changing
 the dma_mask here as well.
 
  +   if (size  (1ULL  32))
  +   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
   }
 
 As you mentioned in another mail in this thread, we wouldn't be
 able to suuport this case on arm64.

The mask would still be valid and even usable if an IOMMU is present. Do
you mean we should not limit it at all here?

There is a scenario where smaller mask would work on arm64. For example
Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
0x8000). A device with 31-bit mask and a dma_pfn_offset of
0x8000 would still work (there isn't any but just as an example). So
the check in dma_alloc_coherent() would be something like:

phys_to_dma(top of ZONE_DMA) - dma_pfn_offset = coherent_dma_mask

(or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)

If the condition above fails, dma_alloc_coherent() would no longer fall
back to swiotlb but issue a dev_warn() and return NULL.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Arnd Bergmann
On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
 On Fri, Nov 21, 2014 at 12:48:09PM +, Arnd Bergmann wrote:
  On Friday 21 November 2014 09:35:10 Catalin Marinas wrote:
   +static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
   +{
   + if (!dma_supported(dev, mask))
   + return -EIO;
   + if (mask  dev-coherent_dma_mask)
   + mask = of_dma_get_range_mask(dev);
   + dev-coherent_dma_mask = mask;
   + return 0;
   +}
  
  There is an interesting side problem here: the dma mask points to
  coherent_dma_mask for all devices probed from DT, so this breaks
  if we have any driver that sets them to different values. It is a
  preexisting problem them.
 
 Such drivers would have to set both masks separately (or call
 dma_set_mask_and_coherent). What we assume though is that dma-ranges
 refers to both dma_mask and coherent_dma_mask. I don't think that would
 be a problem for ARM systems.

Right, I'm just saying that we don't have a way to detect drivers that
break this assumption, not that we have a serious problem already.

EXPORT_SYMBOL_GPL(of_dma_get_range);

   +u64 of_dma_get_range_mask(struct device *dev)
   +{
   + u64 dma_addr, paddr, size;
   +
   + /* no dma mask limiting if no of_node or no dma-ranges property */
   + if (!dev-of_node ||
   + of_dma_get_range(dev-of_node, dma_addr, paddr, size)  0)
   + return DMA_BIT_MASK(64);
  
  If no dma-ranges are present, we should assume that the bus only supports
  32-bit DMA, or we could make it architecture specific. It would probably
  be best for arm64 to require a dma-ranges property for doing any DMA
  at all, but we can't do that on arm32 any more now.
 
 I thought about this but it could break some existing arm64 DT files if
 we mandate dma-ranges property (we could try though). The mask limiting
 is arch-specific anyway.

Yes, this has taken far too long to get addressed, we should have added
the properties right from the start. If we have a function in architecture
specific code, maybe we can just check for the short list of already
supported platforms that need backwards compatibility and require the
mask for everything else?

   diff --git a/drivers/of/platform.c b/drivers/of/platform.c
   index 3b64d0bf5bba..50d1ac4739e6 100644
   --- a/drivers/of/platform.c
   +++ b/drivers/of/platform.c
   @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
 /* DMA ranges found. Calculate and set dma_pfn_offset */
 dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
 dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
   +
   + /* limit the coherent_dma_mask to the dma-ranges size property */
  
  I would change the comment to clarify that we are actually changing
  the dma_mask here as well.
  
   + if (size  (1ULL  32))
   + dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
}
  
  As you mentioned in another mail in this thread, we wouldn't be
  able to suuport this case on arm64.
 
 The mask would still be valid and even usable if an IOMMU is present. Do
 you mean we should not limit it at all here?

The code is certainly correct on arm32, as long as we have an appropriate
DMA zone.

 There is a scenario where smaller mask would work on arm64. For example
 Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
 0x8000). A device with 31-bit mask and a dma_pfn_offset of
 0x8000 would still work (there isn't any but just as an example). So
 the check in dma_alloc_coherent() would be something like:
 
   phys_to_dma(top of ZONE_DMA) - dma_pfn_offset = coherent_dma_mask
 
 (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
 
 If the condition above fails, dma_alloc_coherent() would no longer fall
 back to swiotlb but issue a dev_warn() and return NULL.

Ah, that looks like it should work on all architectures, very nice.
How about checking this condition, and then printing a small warning
(dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
 On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
  There is a scenario where smaller mask would work on arm64. For example
  Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
  0x8000). A device with 31-bit mask and a dma_pfn_offset of
  0x8000 would still work (there isn't any but just as an example). So
  the check in dma_alloc_coherent() would be something like:
  
  phys_to_dma(top of ZONE_DMA) - dma_pfn_offset = coherent_dma_mask
  
  (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
  
  If the condition above fails, dma_alloc_coherent() would no longer fall
  back to swiotlb but issue a dev_warn() and return NULL.
 
 Ah, that looks like it should work on all architectures, very nice.
 How about checking this condition, and then printing a small warning
 (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?

I would not add the above ZONE_DMA check to of_dma_configure(). For
example on arm64, we may not support a small coherent_dma_mask but the
same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
However, that's an arch-specific decision. Maybe after the above setting
of dev-coherent_dma_mask in of_dma_configure(), we could add:

if (!dma_supported(dev, dev-coherent_dma_mask))
dev-dma_mask = NULL;

The arch dma_supported() can check the swiotlb bouncing or ZONE_DMA
limits.

Strangely, we don't have a coherent_dma_supported() but we can defer
such check to dma_alloc_coherent() and that's where we would check the
top of ZONE_DMA.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-21 Thread Catalin Marinas
On Fri, Nov 21, 2014 at 05:51:19PM +, Catalin Marinas wrote:
 On Fri, Nov 21, 2014 at 05:04:28PM +, Arnd Bergmann wrote:
  On Friday 21 November 2014 16:57:09 Catalin Marinas wrote:
   There is a scenario where smaller mask would work on arm64. For example
   Juno, you can have 2GB of RAM in the 32-bit phys range (starting at
   0x8000). A device with 31-bit mask and a dma_pfn_offset of
   0x8000 would still work (there isn't any but just as an example). So
   the check in dma_alloc_coherent() would be something like:
   
 phys_to_dma(top of ZONE_DMA) - dma_pfn_offset = coherent_dma_mask
   
   (or assuming RAM starts at 0 and ignoring dma_pfn_offset for now)
   
   If the condition above fails, dma_alloc_coherent() would no longer fall
   back to swiotlb but issue a dev_warn() and return NULL.
  
  Ah, that looks like it should work on all architectures, very nice.
  How about checking this condition, and then printing a small warning
  (dev_warn, not WARN_ON) and setting the dma_mask pointer to NULL?
 
 I would not add the above ZONE_DMA check to of_dma_configure(). For
 example on arm64, we may not support a small coherent_dma_mask but the
 same value for dma_mask could be fine via swiotlb bouncing (or IOMMU).
 However, that's an arch-specific decision. Maybe after the above setting
 of dev-coherent_dma_mask in of_dma_configure(), we could add:
 
   if (!dma_supported(dev, dev-coherent_dma_mask))
   dev-dma_mask = NULL;
 
 The arch dma_supported() can check the swiotlb bouncing or ZONE_DMA
 limits.

More precisely, that's what I meant:

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..4fcdfed6a6df 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -172,13 +172,6 @@ static void of_dma_configure(struct device *dev)
dev-coherent_dma_mask = DMA_BIT_MASK(32);
 
/*
-* Set it to coherent_dma_mask by default if the architecture
-* code has not set it.
-*/
-   if (!dev-dma_mask)
-   dev-dma_mask = dev-coherent_dma_mask;
-
-   /*
 * if dma-coherent property exist, call arch hook to setup
 * dma coherent operations.
 */
@@ -188,18 +181,32 @@ static void of_dma_configure(struct device *dev)
}
 
/*
-* if dma-ranges property doesn't exist - just return else
-* setup the dma offset
+* If dma-ranges property exists - setup the dma offset.
 */
ret = of_dma_get_range(dev-of_node, dma_addr, paddr, size);
if (ret  0) {
dev_dbg(dev, no dma range information to setup\n);
-   return;
+   size = 0;
+   } else {
+   /* DMA ranges found. Calculate and set dma_pfn_offset */
+   dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
+   dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
}
 
-   /* DMA ranges found. Calculate and set dma_pfn_offset */
-   dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
-   dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
+   /*
+* If the bus dma-ranges property specifies a size smaller than 4GB,
+* the device would not be capable of accessing the whole 32-bit
+* space, so reduce the default coherent_dma_mask accordingly.
+*/
+   if (size  size  (1ULL  32))
+   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
+
+   /*
+* Set dma_mask to coherent_dma_mask by default if the architecture
+* code has not set it and DMA on such mask is supported.
+*/
+   if (!dev-dma_mask  dma_supported(dev, dev-coherent_dma_mask))
+   dev-dma_mask = dev-coherent_dma_mask;
 }
 
 /**

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-20 Thread Ding Tianhong
On 2014/11/20 17:02, Arnd Bergmann wrote:
> On Thursday 20 November 2014 16:34:29 Ding Tianhong wrote:
>>>
>>> I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
>>> code currently doesn't look at the mask. As I explained in my reply to
>>> Catalin, it should set the mask to the size of the dma-ranges if that is
>>> 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
>>> property to decide what to set the mask to when a driver asks for a
>>> mask larger than 64-bit.
>>>
>> ok, I think the your reply to catalin is clear, I got it, add a
>> appropriate mask for the dev is
>> reasonable, I think it should be fixed later.
>>
>> But in my case, if I don't use the DMA_CMA, the dma_alloc_coherent
>> should use the swiotlb directly which maximum is 16M,
>> so unless I use the kmalloc otherwise I have no better idea for that.
> 
> But if you just change your driver to set the 64-bit DMA mask,
> it will no longer use the swiotlb and work much faster without
> that. You have to do it anyway, even once we fix the architecture
> code.
> 

Yes, I did, it is useful, thanks.

> The fix in the architecture code is just required so things don't
> break if you have a 64-bit DMA capable device on a 32-bit bus.
> 
> On a related note, we should really fix the network and scsi core
> code not lot just look at PCI_DMA_BUS_IS_PHYS but check the presence
> of an IOMMU for devices that are limited to 32-bit DMA, so we fall
> back to block bounce and lowmem SKB instead of swiotlb for IOMMU-less
> devices.
> 
agree.

> Which driver are you using BTW?
> 
soc ppe drivers which is not upstream yet, and sas drivers.

>   Arnd

Regards
Ding

> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-20 Thread Arnd Bergmann
On Thursday 20 November 2014 16:34:29 Ding Tianhong wrote:
> > 
> > I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
> > code currently doesn't look at the mask. As I explained in my reply to
> > Catalin, it should set the mask to the size of the dma-ranges if that is
> > 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
> > property to decide what to set the mask to when a driver asks for a
> > mask larger than 64-bit.
> > 
> ok, I think the your reply to catalin is clear, I got it, add a
> appropriate mask for the dev is
> reasonable, I think it should be fixed later.
> 
> But in my case, if I don't use the DMA_CMA, the dma_alloc_coherent
> should use the swiotlb directly which maximum is 16M,
> so unless I use the kmalloc otherwise I have no better idea for that.

But if you just change your driver to set the 64-bit DMA mask,
it will no longer use the swiotlb and work much faster without
that. You have to do it anyway, even once we fix the architecture
code.

The fix in the architecture code is just required so things don't
break if you have a 64-bit DMA capable device on a 32-bit bus.

On a related note, we should really fix the network and scsi core
code not lot just look at PCI_DMA_BUS_IS_PHYS but check the presence
of an IOMMU for devices that are limited to 32-bit DMA, so we fall
back to block bounce and lowmem SKB instead of swiotlb for IOMMU-less
devices.

Which driver are you using BTW?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-20 Thread Ding Tianhong
On 2014/11/20 15:40, Arnd Bergmann wrote:
> On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
>> On 2014/11/19 16:45, Arnd Bergmann wrote:
>>> On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
 On 2014/11/18 2:09, Catalin Marinas wrote:
> On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
 Thanks everyone, I think I found the way to fix it, need to enable 
 DMA_CMA, to reserve a big memory
 for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
 not use the swiotlb until
 the memory out of mask or swiotlb_force is enabled.

 If I still understand uncorrectly, please inform me.

>>>
>>> Please do not use CMA to work around the problem, but fix the underlying bug
>>> instead.
>>>
>>> The driver should call 'dma_set_mask_and_coherent()' with the appropriate
>>> dma mask, and check whether that succeeded. However, the code implementing
>>> dma_set_mask_and_coherent on arm64 also needs to be changed to look up
>>> the dma-ranges property (see of_dma_configure()), and check if the mask
>>> is possible.
>>>
>> The dma_pfn_offset looks only support arm32, but my platform is aarch64 and 
>> I check the latest kernel version, 
>> I think the dma-rangs still could not work for aarch64, so maybe we should 
>> add dma_pfn_offset for aarch64 first.
>>
> 
> I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
> code currently doesn't look at the mask. As I explained in my reply to
> Catalin, it should set the mask to the size of the dma-ranges if that is
> 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
> property to decide what to set the mask to when a driver asks for a
> mask larger than 64-bit.
> 
>   Arnd
> 
ok, I think the your reply to catalin is clear, I got it, add a appropriate 
mask for the dev is
reasonable, I think it should be fixed later.

But in my case, if I don't use the DMA_CMA, the dma_alloc_coherent should use 
the swiotlb directly which maximum is 16M,
so unless I use the kmalloc otherwise I have no better idea for that.

Regards
Ding 

> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-20 Thread Ding Tianhong
On 2014/11/20 15:40, Arnd Bergmann wrote:
 On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
 On 2014/11/19 16:45, Arnd Bergmann wrote:
 On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
 On 2014/11/18 2:09, Catalin Marinas wrote:
 On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
 Thanks everyone, I think I found the way to fix it, need to enable 
 DMA_CMA, to reserve a big memory
 for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
 not use the swiotlb until
 the memory out of mask or swiotlb_force is enabled.

 If I still understand uncorrectly, please inform me.


 Please do not use CMA to work around the problem, but fix the underlying bug
 instead.

 The driver should call 'dma_set_mask_and_coherent()' with the appropriate
 dma mask, and check whether that succeeded. However, the code implementing
 dma_set_mask_and_coherent on arm64 also needs to be changed to look up
 the dma-ranges property (see of_dma_configure()), and check if the mask
 is possible.

 The dma_pfn_offset looks only support arm32, but my platform is aarch64 and 
 I check the latest kernel version, 
 I think the dma-rangs still could not work for aarch64, so maybe we should 
 add dma_pfn_offset for aarch64 first.

 
 I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
 code currently doesn't look at the mask. As I explained in my reply to
 Catalin, it should set the mask to the size of the dma-ranges if that is
 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
 property to decide what to set the mask to when a driver asks for a
 mask larger than 64-bit.
 
   Arnd
 
ok, I think the your reply to catalin is clear, I got it, add a appropriate 
mask for the dev is
reasonable, I think it should be fixed later.

But in my case, if I don't use the DMA_CMA, the dma_alloc_coherent should use 
the swiotlb directly which maximum is 16M,
so unless I use the kmalloc otherwise I have no better idea for that.

Regards
Ding 

 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-20 Thread Arnd Bergmann
On Thursday 20 November 2014 16:34:29 Ding Tianhong wrote:
  
  I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
  code currently doesn't look at the mask. As I explained in my reply to
  Catalin, it should set the mask to the size of the dma-ranges if that is
  32-bit or smaller, and dma_set_mask should look at the same dma-ranges
  property to decide what to set the mask to when a driver asks for a
  mask larger than 64-bit.
  
 ok, I think the your reply to catalin is clear, I got it, add a
 appropriate mask for the dev is
 reasonable, I think it should be fixed later.
 
 But in my case, if I don't use the DMA_CMA, the dma_alloc_coherent
 should use the swiotlb directly which maximum is 16M,
 so unless I use the kmalloc otherwise I have no better idea for that.

But if you just change your driver to set the 64-bit DMA mask,
it will no longer use the swiotlb and work much faster without
that. You have to do it anyway, even once we fix the architecture
code.

The fix in the architecture code is just required so things don't
break if you have a 64-bit DMA capable device on a 32-bit bus.

On a related note, we should really fix the network and scsi core
code not lot just look at PCI_DMA_BUS_IS_PHYS but check the presence
of an IOMMU for devices that are limited to 32-bit DMA, so we fall
back to block bounce and lowmem SKB instead of swiotlb for IOMMU-less
devices.

Which driver are you using BTW?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-20 Thread Ding Tianhong
On 2014/11/20 17:02, Arnd Bergmann wrote:
 On Thursday 20 November 2014 16:34:29 Ding Tianhong wrote:

 I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
 code currently doesn't look at the mask. As I explained in my reply to
 Catalin, it should set the mask to the size of the dma-ranges if that is
 32-bit or smaller, and dma_set_mask should look at the same dma-ranges
 property to decide what to set the mask to when a driver asks for a
 mask larger than 64-bit.

 ok, I think the your reply to catalin is clear, I got it, add a
 appropriate mask for the dev is
 reasonable, I think it should be fixed later.

 But in my case, if I don't use the DMA_CMA, the dma_alloc_coherent
 should use the swiotlb directly which maximum is 16M,
 so unless I use the kmalloc otherwise I have no better idea for that.
 
 But if you just change your driver to set the 64-bit DMA mask,
 it will no longer use the swiotlb and work much faster without
 that. You have to do it anyway, even once we fix the architecture
 code.
 

Yes, I did, it is useful, thanks.

 The fix in the architecture code is just required so things don't
 break if you have a 64-bit DMA capable device on a 32-bit bus.
 
 On a related note, we should really fix the network and scsi core
 code not lot just look at PCI_DMA_BUS_IS_PHYS but check the presence
 of an IOMMU for devices that are limited to 32-bit DMA, so we fall
 back to block bounce and lowmem SKB instead of swiotlb for IOMMU-less
 devices.
 
agree.

 Which driver are you using BTW?
 
soc ppe drivers which is not upstream yet, and sas drivers.

   Arnd

Regards
Ding

 
 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
> On 2014/11/19 16:45, Arnd Bergmann wrote:
> > On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
> >> On 2014/11/18 2:09, Catalin Marinas wrote:
> >>> On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
> >> Thanks everyone, I think I found the way to fix it, need to enable 
> >> DMA_CMA, to reserve a big memory
> >> for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
> >> not use the swiotlb until
> >> the memory out of mask or swiotlb_force is enabled.
> >>
> >> If I still understand uncorrectly, please inform me.
> >>
> > 
> > Please do not use CMA to work around the problem, but fix the underlying bug
> > instead.
> > 
> > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > dma mask, and check whether that succeeded. However, the code implementing
> > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > the dma-ranges property (see of_dma_configure()), and check if the mask
> > is possible.
> > 
> The dma_pfn_offset looks only support arm32, but my platform is aarch64 and I 
> check the latest kernel version, 
> I think the dma-rangs still could not work for aarch64, so maybe we should 
> add dma_pfn_offset for aarch64 first.
> 

I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
code currently doesn't look at the mask. As I explained in my reply to
Catalin, it should set the mask to the size of the dma-ranges if that is
32-bit or smaller, and dma_set_mask should look at the same dma-ranges
property to decide what to set the mask to when a driver asks for a
mask larger than 64-bit.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Ding Tianhong
On 2014/11/19 16:45, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
>> On 2014/11/18 2:09, Catalin Marinas wrote:
>>> On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
 On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
> The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
> swiotlb late initialisation)
> switches the DMA mapping code to 
> swiotlb_tlb_late_init_with_default_size(), this will occur a problem
> when I run the scsi stress tests, the message as below:
>
> sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
> bytes)..
> DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
>
> The reason is that the swiotlb_tlb_late_init_with_default_size() could 
> only alloc 16M memory for DMA-mapping,
> and the param in cmdline "swiotlb=xxx" is useless because the 
> get_free_pages() only use the buddy to assigned a
> maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
> too small in many scenes, but
> the swiotlb_init() which could reserved a bigger memory as wished could 
> work well for most drivers.
>
> I could not get a better way to fix this problem except to revert this 
> patch, so could you please give me some
> advise and help me, thanks very much.

 In general, you should not need to use swiotlb for most devices, in
 particular for high-performance devices like network or block.

 Please make sure that you have set up the dma-ranges properties in
 your DT properly to allow 64-bit DMA if the device supports it.
>>>
>>> That's the problem indeed, the DMA API ends up using swiotlb bounce
>>> buffers because the physical address of the pages passed to (or
>>> allocated by) the driver are beyond 32-bit limit (which is the default
>>> dma mask).
>>>
>>
>> Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, 
>> to reserve a big memory
>> for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
>> not use the swiotlb until
>> the memory out of mask or swiotlb_force is enabled.
>>
>> If I still understand uncorrectly, please inform me.
>>
> 
> Please do not use CMA to work around the problem, but fix the underlying bug
> instead.
> 
> The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> dma mask, and check whether that succeeded. However, the code implementing
> dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> the dma-ranges property (see of_dma_configure()), and check if the mask
> is possible.
> 
>   Arnd
> 

The dma_pfn_offset looks only support arm32, but my platform is aarch64 and I 
check the latest kernel version, 
I think the dma-rangs still could not work for aarch64, so maybe we should add 
dma_pfn_offset for aarch64 first.

Ding

> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> On Wed, Nov 19, 2014 at 12:48:58PM +, Arnd Bergmann wrote:
> > On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
> > > > The driver should call 'dma_set_mask_and_coherent()' with the 
> > > > appropriate
> > > > dma mask, and check whether that succeeded. However, the code 
> > > > implementing
> > > > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > > > the dma-ranges property (see of_dma_configure()), and check if the mask
> > > > is possible.
> > > 
> > > dma_set_mask_and_coherent() is a generic function. I think the
> > > of_dma_configure() should start with a coherent_dma_mask based on
> > > dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
> > > comment in of_dma_configure() says that devices should set up the
> > > supported mask but it's not always up to them but the bus they are
> > > connected to.
> > > 
> > > Something like below, untested:
> > > 
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index 3b64d0bf5bba..dff34883db45 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> > > /* DMA ranges found. Calculate and set dma_pfn_offset */
> > > dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> > > dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> > > +
> > > +   /* Set the coherent_dma_mask based on the dma-ranges property */
> > > +   if (size)
> > > +   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> > >  }
> > 
> > We have discussed this in the past, and the problem with this is
> > that the actual mask depends both on the capabilities of the
> > device and the bus. In particular, if the device can only do
> > 32-bit DMA, we must not set the mask to something higher.
> 
> So is the dma-ranges property allowed to specify a size bigger than what
> the device supports?

dma-ranges is a property that describes the bus, but you can have
devices with different capabilities on the same bus. In particular
on PCI, you will have 32-bit and 64-bit masters on the same host
controller.

> > The normal rule here is that a driver that wants to do 64-bit
> > DMA must call dma_set_mask_and_coherent() with the higher mask,
> > while a device that can not access all of the 32-bit address space
> > must call dma_set_mask_and_coherent() with the smaller mask
> > before doing calling any of the other DMA interfaces.
> 
> OK, looking at the DMA API docs, it looks like the default mask is
> 32-bit and any other value should be explicitly set by the driver.
> 
> What we don't have on arm64 yet is taking dma_pfn_offset into account
> when generating the dma address (but so far I haven't seen any request
> for this from hardware vendors; it can easily be fixed). So if that's
> not the case for Ding, I'm not sure dma-ranges property would help.

You can ignore dma_pfn_offset for now, until someone needs it. What
we definitely need is the size of the range.

> > However, if the bus is not capable of addressing the entire
> > 32-bit range (as on some modern shmobile machines, or some of the
> > really old machines), we need to limit the mask here already.
> 
> Would the limiting be based on the dma-ranges size property?

Yes.

> Such information is not easily available after of_dma_configure(),
> maybe we could store it somewhere in struct device.

I don't think it's necessary. Setting the dma mask is a rare operation,
and we only need to check again if the new mask is larger than the
old one.

> Going back to original topic, the dma_supported() function on arm64
> calls swiotlb_dma_supported() which actually checks whether the swiotlb
> bounce buffer is within the dma mask. This transparent bouncing (unlike
> arm32 where it needs to be explicit) is not always optimal, though
> required for 32-bit only devices on a 64-bit system. The problem is when
> the driver is 64-bit capable but forgets to call
> dma_set_mask_and_coherent() (that's not the only question I got about
> running out of swiotlb buffers).

I think it would be nice to warn once per device that starts using the
swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
attached.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Catalin Marinas
On Wed, Nov 19, 2014 at 12:48:58PM +, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
> > > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > > dma mask, and check whether that succeeded. However, the code implementing
> > > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > > the dma-ranges property (see of_dma_configure()), and check if the mask
> > > is possible.
> > 
> > dma_set_mask_and_coherent() is a generic function. I think the
> > of_dma_configure() should start with a coherent_dma_mask based on
> > dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
> > comment in of_dma_configure() says that devices should set up the
> > supported mask but it's not always up to them but the bus they are
> > connected to.
> > 
> > Something like below, untested:
> > 
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 3b64d0bf5bba..dff34883db45 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> > /* DMA ranges found. Calculate and set dma_pfn_offset */
> > dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> > dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> > +
> > +   /* Set the coherent_dma_mask based on the dma-ranges property */
> > +   if (size)
> > +   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> >  }
> 
> We have discussed this in the past, and the problem with this is
> that the actual mask depends both on the capabilities of the
> device and the bus. In particular, if the device can only do
> 32-bit DMA, we must not set the mask to something higher.

So is the dma-ranges property allowed to specify a size bigger than what
the device supports?

> The normal rule here is that a driver that wants to do 64-bit
> DMA must call dma_set_mask_and_coherent() with the higher mask,
> while a device that can not access all of the 32-bit address space
> must call dma_set_mask_and_coherent() with the smaller mask
> before doing calling any of the other DMA interfaces.

OK, looking at the DMA API docs, it looks like the default mask is
32-bit and any other value should be explicitly set by the driver.

What we don't have on arm64 yet is taking dma_pfn_offset into account
when generating the dma address (but so far I haven't seen any request
for this from hardware vendors; it can easily be fixed). So if that's
not the case for Ding, I'm not sure dma-ranges property would help.

> However, if the bus is not capable of addressing the entire
> 32-bit range (as on some modern shmobile machines, or some of the
> really old machines), we need to limit the mask here already.

Would the limiting be based on the dma-ranges size property? Such
information is not easily available after of_dma_configure(), maybe we
could store it somewhere in struct device.

Going back to original topic, the dma_supported() function on arm64
calls swiotlb_dma_supported() which actually checks whether the swiotlb
bounce buffer is within the dma mask. This transparent bouncing (unlike
arm32 where it needs to be explicit) is not always optimal, though
required for 32-bit only devices on a 64-bit system. The problem is when
the driver is 64-bit capable but forgets to call
dma_set_mask_and_coherent() (that's not the only question I got about
running out of swiotlb buffers).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
> > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > dma mask, and check whether that succeeded. However, the code implementing
> > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > the dma-ranges property (see of_dma_configure()), and check if the mask
> > is possible.
> 
> dma_set_mask_and_coherent() is a generic function. I think the
> of_dma_configure() should start with a coherent_dma_mask based on
> dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
> comment in of_dma_configure() says that devices should set up the
> supported mask but it's not always up to them but the bus they are
> connected to.
> 
> Something like below, untested:
> 
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 3b64d0bf5bba..dff34883db45 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> /* DMA ranges found. Calculate and set dma_pfn_offset */
> dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> +
> +   /* Set the coherent_dma_mask based on the dma-ranges property */
> +   if (size)
> +   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
>  }
>  
> 

We have discussed this in the past, and the problem with this is
that the actual mask depends both on the capabilities of the
device and the bus. In particular, if the device can only do
32-bit DMA, we must not set the mask to something higher.

The normal rule here is that a driver that wants to do 64-bit
DMA must call dma_set_mask_and_coherent() with the higher mask,
while a device that can not access all of the 32-bit address space
must call dma_set_mask_and_coherent() with the smaller mask
before doing calling any of the other DMA interfaces.

However, if the bus is not capable of addressing the entire
32-bit range (as on some modern shmobile machines, or some of the
really old machines), we need to limit the mask here already.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Catalin Marinas
On Wed, Nov 19, 2014 at 08:45:43AM +, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
> > On 2014/11/18 2:09, Catalin Marinas wrote:
> > > On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
> > >> On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
> > >>> The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
> > >>> swiotlb late initialisation)
> > >>> switches the DMA mapping code to 
> > >>> swiotlb_tlb_late_init_with_default_size(), this will occur a problem
> > >>> when I run the scsi stress tests, the message as below:
> > >>>
> > >>> sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
> > >>> bytes)..
> > >>> DMA: Out of SW-IOMMU space for 65536 bytes at device 
> > >>> b100.sas
> > >>>
> > >>> The reason is that the swiotlb_tlb_late_init_with_default_size() could 
> > >>> only alloc 16M memory for DMA-mapping,
> > >>> and the param in cmdline "swiotlb=xxx" is useless because the 
> > >>> get_free_pages() only use the buddy to assigned a
> > >>> maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M 
> > >>> is too small in many scenes, but
> > >>> the swiotlb_init() which could reserved a bigger memory as wished could 
> > >>> work well for most drivers.
> > >>>
> > >>> I could not get a better way to fix this problem except to revert this 
> > >>> patch, so could you please give me some
> > >>> advise and help me, thanks very much.
> > >>
> > >> In general, you should not need to use swiotlb for most devices, in
> > >> particular for high-performance devices like network or block.
> > >>
> > >> Please make sure that you have set up the dma-ranges properties in
> > >> your DT properly to allow 64-bit DMA if the device supports it.
> > > 
> > > That's the problem indeed, the DMA API ends up using swiotlb bounce
> > > buffers because the physical address of the pages passed to (or
> > > allocated by) the driver are beyond 32-bit limit (which is the default
> > > dma mask).
> > > 
> > 
> > Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, 
> > to reserve a big memory
> > for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
> > not use the swiotlb until
> > the memory out of mask or swiotlb_force is enabled.
> > 
> > If I still understand uncorrectly, please inform me.
> 
> Please do not use CMA to work around the problem, but fix the underlying bug
> instead.

Agree.

> The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> dma mask, and check whether that succeeded. However, the code implementing
> dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> the dma-ranges property (see of_dma_configure()), and check if the mask
> is possible.

dma_set_mask_and_coherent() is a generic function. I think the
of_dma_configure() should start with a coherent_dma_mask based on
dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
comment in of_dma_configure() says that devices should set up the
supported mask but it's not always up to them but the bus they are
connected to.

Something like below, untested:

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..dff34883db45 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
/* DMA ranges found. Calculate and set dma_pfn_offset */
dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
+
+   /* Set the coherent_dma_mask based on the dma-ranges property */
+   if (size)
+   dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
 }
 
 /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
> On 2014/11/18 2:09, Catalin Marinas wrote:
> > On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
> >> On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
> >>> The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
> >>> swiotlb late initialisation)
> >>> switches the DMA mapping code to 
> >>> swiotlb_tlb_late_init_with_default_size(), this will occur a problem
> >>> when I run the scsi stress tests, the message as below:
> >>>
> >>> sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
> >>> bytes)..
> >>> DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
> >>>
> >>> The reason is that the swiotlb_tlb_late_init_with_default_size() could 
> >>> only alloc 16M memory for DMA-mapping,
> >>> and the param in cmdline "swiotlb=xxx" is useless because the 
> >>> get_free_pages() only use the buddy to assigned a
> >>> maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
> >>> too small in many scenes, but
> >>> the swiotlb_init() which could reserved a bigger memory as wished could 
> >>> work well for most drivers.
> >>>
> >>> I could not get a better way to fix this problem except to revert this 
> >>> patch, so could you please give me some
> >>> advise and help me, thanks very much.
> >>
> >> In general, you should not need to use swiotlb for most devices, in
> >> particular for high-performance devices like network or block.
> >>
> >> Please make sure that you have set up the dma-ranges properties in
> >> your DT properly to allow 64-bit DMA if the device supports it.
> > 
> > That's the problem indeed, the DMA API ends up using swiotlb bounce
> > buffers because the physical address of the pages passed to (or
> > allocated by) the driver are beyond 32-bit limit (which is the default
> > dma mask).
> > 
> 
> Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, 
> to reserve a big memory
> for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
> not use the swiotlb until
> the memory out of mask or swiotlb_force is enabled.
> 
> If I still understand uncorrectly, please inform me.
> 

Please do not use CMA to work around the problem, but fix the underlying bug
instead.

The driver should call 'dma_set_mask_and_coherent()' with the appropriate
dma mask, and check whether that succeeded. However, the code implementing
dma_set_mask_and_coherent on arm64 also needs to be changed to look up
the dma-ranges property (see of_dma_configure()), and check if the mask
is possible.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
 On 2014/11/18 2:09, Catalin Marinas wrote:
  On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
  On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
  The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
  swiotlb late initialisation)
  switches the DMA mapping code to 
  swiotlb_tlb_late_init_with_default_size(), this will occur a problem
  when I run the scsi stress tests, the message as below:
 
  sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
  bytes)..
  DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
 
  The reason is that the swiotlb_tlb_late_init_with_default_size() could 
  only alloc 16M memory for DMA-mapping,
  and the param in cmdline swiotlb=xxx is useless because the 
  get_free_pages() only use the buddy to assigned a
  maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
  too small in many scenes, but
  the swiotlb_init() which could reserved a bigger memory as wished could 
  work well for most drivers.
 
  I could not get a better way to fix this problem except to revert this 
  patch, so could you please give me some
  advise and help me, thanks very much.
 
  In general, you should not need to use swiotlb for most devices, in
  particular for high-performance devices like network or block.
 
  Please make sure that you have set up the dma-ranges properties in
  your DT properly to allow 64-bit DMA if the device supports it.
  
  That's the problem indeed, the DMA API ends up using swiotlb bounce
  buffers because the physical address of the pages passed to (or
  allocated by) the driver are beyond 32-bit limit (which is the default
  dma mask).
  
 
 Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, 
 to reserve a big memory
 for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
 not use the swiotlb until
 the memory out of mask or swiotlb_force is enabled.
 
 If I still understand uncorrectly, please inform me.
 

Please do not use CMA to work around the problem, but fix the underlying bug
instead.

The driver should call 'dma_set_mask_and_coherent()' with the appropriate
dma mask, and check whether that succeeded. However, the code implementing
dma_set_mask_and_coherent on arm64 also needs to be changed to look up
the dma-ranges property (see of_dma_configure()), and check if the mask
is possible.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Catalin Marinas
On Wed, Nov 19, 2014 at 08:45:43AM +, Arnd Bergmann wrote:
 On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
  On 2014/11/18 2:09, Catalin Marinas wrote:
   On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
   On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
   The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
   swiotlb late initialisation)
   switches the DMA mapping code to 
   swiotlb_tlb_late_init_with_default_size(), this will occur a problem
   when I run the scsi stress tests, the message as below:
  
   sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
   bytes)..
   DMA: Out of SW-IOMMU space for 65536 bytes at device 
   b100.sas
  
   The reason is that the swiotlb_tlb_late_init_with_default_size() could 
   only alloc 16M memory for DMA-mapping,
   and the param in cmdline swiotlb=xxx is useless because the 
   get_free_pages() only use the buddy to assigned a
   maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M 
   is too small in many scenes, but
   the swiotlb_init() which could reserved a bigger memory as wished could 
   work well for most drivers.
  
   I could not get a better way to fix this problem except to revert this 
   patch, so could you please give me some
   advise and help me, thanks very much.
  
   In general, you should not need to use swiotlb for most devices, in
   particular for high-performance devices like network or block.
  
   Please make sure that you have set up the dma-ranges properties in
   your DT properly to allow 64-bit DMA if the device supports it.
   
   That's the problem indeed, the DMA API ends up using swiotlb bounce
   buffers because the physical address of the pages passed to (or
   allocated by) the driver are beyond 32-bit limit (which is the default
   dma mask).
   
  
  Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, 
  to reserve a big memory
  for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
  not use the swiotlb until
  the memory out of mask or swiotlb_force is enabled.
  
  If I still understand uncorrectly, please inform me.
 
 Please do not use CMA to work around the problem, but fix the underlying bug
 instead.

Agree.

 The driver should call 'dma_set_mask_and_coherent()' with the appropriate
 dma mask, and check whether that succeeded. However, the code implementing
 dma_set_mask_and_coherent on arm64 also needs to be changed to look up
 the dma-ranges property (see of_dma_configure()), and check if the mask
 is possible.

dma_set_mask_and_coherent() is a generic function. I think the
of_dma_configure() should start with a coherent_dma_mask based on
dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
comment in of_dma_configure() says that devices should set up the
supported mask but it's not always up to them but the bus they are
connected to.

Something like below, untested:

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..dff34883db45 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
/* DMA ranges found. Calculate and set dma_pfn_offset */
dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
+
+   /* Set the coherent_dma_mask based on the dma-ranges property */
+   if (size)
+   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
 }
 
 /**
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
  The driver should call 'dma_set_mask_and_coherent()' with the appropriate
  dma mask, and check whether that succeeded. However, the code implementing
  dma_set_mask_and_coherent on arm64 also needs to be changed to look up
  the dma-ranges property (see of_dma_configure()), and check if the mask
  is possible.
 
 dma_set_mask_and_coherent() is a generic function. I think the
 of_dma_configure() should start with a coherent_dma_mask based on
 dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
 comment in of_dma_configure() says that devices should set up the
 supported mask but it's not always up to them but the bus they are
 connected to.
 
 Something like below, untested:
 
 diff --git a/drivers/of/platform.c b/drivers/of/platform.c
 index 3b64d0bf5bba..dff34883db45 100644
 --- a/drivers/of/platform.c
 +++ b/drivers/of/platform.c
 @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
 /* DMA ranges found. Calculate and set dma_pfn_offset */
 dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
 dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
 +
 +   /* Set the coherent_dma_mask based on the dma-ranges property */
 +   if (size)
 +   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
  }
  
 

We have discussed this in the past, and the problem with this is
that the actual mask depends both on the capabilities of the
device and the bus. In particular, if the device can only do
32-bit DMA, we must not set the mask to something higher.

The normal rule here is that a driver that wants to do 64-bit
DMA must call dma_set_mask_and_coherent() with the higher mask,
while a device that can not access all of the 32-bit address space
must call dma_set_mask_and_coherent() with the smaller mask
before doing calling any of the other DMA interfaces.

However, if the bus is not capable of addressing the entire
32-bit range (as on some modern shmobile machines, or some of the
really old machines), we need to limit the mask here already.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Catalin Marinas
On Wed, Nov 19, 2014 at 12:48:58PM +, Arnd Bergmann wrote:
 On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
   The driver should call 'dma_set_mask_and_coherent()' with the appropriate
   dma mask, and check whether that succeeded. However, the code implementing
   dma_set_mask_and_coherent on arm64 also needs to be changed to look up
   the dma-ranges property (see of_dma_configure()), and check if the mask
   is possible.
  
  dma_set_mask_and_coherent() is a generic function. I think the
  of_dma_configure() should start with a coherent_dma_mask based on
  dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
  comment in of_dma_configure() says that devices should set up the
  supported mask but it's not always up to them but the bus they are
  connected to.
  
  Something like below, untested:
  
  diff --git a/drivers/of/platform.c b/drivers/of/platform.c
  index 3b64d0bf5bba..dff34883db45 100644
  --- a/drivers/of/platform.c
  +++ b/drivers/of/platform.c
  @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
  /* DMA ranges found. Calculate and set dma_pfn_offset */
  dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
  dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
  +
  +   /* Set the coherent_dma_mask based on the dma-ranges property */
  +   if (size)
  +   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
   }
 
 We have discussed this in the past, and the problem with this is
 that the actual mask depends both on the capabilities of the
 device and the bus. In particular, if the device can only do
 32-bit DMA, we must not set the mask to something higher.

So is the dma-ranges property allowed to specify a size bigger than what
the device supports?

 The normal rule here is that a driver that wants to do 64-bit
 DMA must call dma_set_mask_and_coherent() with the higher mask,
 while a device that can not access all of the 32-bit address space
 must call dma_set_mask_and_coherent() with the smaller mask
 before doing calling any of the other DMA interfaces.

OK, looking at the DMA API docs, it looks like the default mask is
32-bit and any other value should be explicitly set by the driver.

What we don't have on arm64 yet is taking dma_pfn_offset into account
when generating the dma address (but so far I haven't seen any request
for this from hardware vendors; it can easily be fixed). So if that's
not the case for Ding, I'm not sure dma-ranges property would help.

 However, if the bus is not capable of addressing the entire
 32-bit range (as on some modern shmobile machines, or some of the
 really old machines), we need to limit the mask here already.

Would the limiting be based on the dma-ranges size property? Such
information is not easily available after of_dma_configure(), maybe we
could store it somewhere in struct device.

Going back to original topic, the dma_supported() function on arm64
calls swiotlb_dma_supported() which actually checks whether the swiotlb
bounce buffer is within the dma mask. This transparent bouncing (unlike
arm32 where it needs to be explicit) is not always optimal, though
required for 32-bit only devices on a 64-bit system. The problem is when
the driver is 64-bit capable but forgets to call
dma_set_mask_and_coherent() (that's not the only question I got about
running out of swiotlb buffers).

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
 On Wed, Nov 19, 2014 at 12:48:58PM +, Arnd Bergmann wrote:
  On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
The driver should call 'dma_set_mask_and_coherent()' with the 
appropriate
dma mask, and check whether that succeeded. However, the code 
implementing
dma_set_mask_and_coherent on arm64 also needs to be changed to look up
the dma-ranges property (see of_dma_configure()), and check if the mask
is possible.
   
   dma_set_mask_and_coherent() is a generic function. I think the
   of_dma_configure() should start with a coherent_dma_mask based on
   dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
   comment in of_dma_configure() says that devices should set up the
   supported mask but it's not always up to them but the bus they are
   connected to.
   
   Something like below, untested:
   
   diff --git a/drivers/of/platform.c b/drivers/of/platform.c
   index 3b64d0bf5bba..dff34883db45 100644
   --- a/drivers/of/platform.c
   +++ b/drivers/of/platform.c
   @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
   /* DMA ranges found. Calculate and set dma_pfn_offset */
   dev-dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
   dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
   +
   +   /* Set the coherent_dma_mask based on the dma-ranges property */
   +   if (size)
   +   dev-coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
}
  
  We have discussed this in the past, and the problem with this is
  that the actual mask depends both on the capabilities of the
  device and the bus. In particular, if the device can only do
  32-bit DMA, we must not set the mask to something higher.
 
 So is the dma-ranges property allowed to specify a size bigger than what
 the device supports?

dma-ranges is a property that describes the bus, but you can have
devices with different capabilities on the same bus. In particular
on PCI, you will have 32-bit and 64-bit masters on the same host
controller.

  The normal rule here is that a driver that wants to do 64-bit
  DMA must call dma_set_mask_and_coherent() with the higher mask,
  while a device that can not access all of the 32-bit address space
  must call dma_set_mask_and_coherent() with the smaller mask
  before doing calling any of the other DMA interfaces.
 
 OK, looking at the DMA API docs, it looks like the default mask is
 32-bit and any other value should be explicitly set by the driver.
 
 What we don't have on arm64 yet is taking dma_pfn_offset into account
 when generating the dma address (but so far I haven't seen any request
 for this from hardware vendors; it can easily be fixed). So if that's
 not the case for Ding, I'm not sure dma-ranges property would help.

You can ignore dma_pfn_offset for now, until someone needs it. What
we definitely need is the size of the range.

  However, if the bus is not capable of addressing the entire
  32-bit range (as on some modern shmobile machines, or some of the
  really old machines), we need to limit the mask here already.
 
 Would the limiting be based on the dma-ranges size property?

Yes.

 Such information is not easily available after of_dma_configure(),
 maybe we could store it somewhere in struct device.

I don't think it's necessary. Setting the dma mask is a rare operation,
and we only need to check again if the new mask is larger than the
old one.

 Going back to original topic, the dma_supported() function on arm64
 calls swiotlb_dma_supported() which actually checks whether the swiotlb
 bounce buffer is within the dma mask. This transparent bouncing (unlike
 arm32 where it needs to be explicit) is not always optimal, though
 required for 32-bit only devices on a 64-bit system. The problem is when
 the driver is 64-bit capable but forgets to call
 dma_set_mask_and_coherent() (that's not the only question I got about
 running out of swiotlb buffers).

I think it would be nice to warn once per device that starts using the
swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
attached.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Ding Tianhong
On 2014/11/19 16:45, Arnd Bergmann wrote:
 On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
 On 2014/11/18 2:09, Catalin Marinas wrote:
 On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
 On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
 The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
 swiotlb late initialisation)
 switches the DMA mapping code to 
 swiotlb_tlb_late_init_with_default_size(), this will occur a problem
 when I run the scsi stress tests, the message as below:

 sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
 bytes)..
 DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas

 The reason is that the swiotlb_tlb_late_init_with_default_size() could 
 only alloc 16M memory for DMA-mapping,
 and the param in cmdline swiotlb=xxx is useless because the 
 get_free_pages() only use the buddy to assigned a
 maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
 too small in many scenes, but
 the swiotlb_init() which could reserved a bigger memory as wished could 
 work well for most drivers.

 I could not get a better way to fix this problem except to revert this 
 patch, so could you please give me some
 advise and help me, thanks very much.

 In general, you should not need to use swiotlb for most devices, in
 particular for high-performance devices like network or block.

 Please make sure that you have set up the dma-ranges properties in
 your DT properly to allow 64-bit DMA if the device supports it.

 That's the problem indeed, the DMA API ends up using swiotlb bounce
 buffers because the physical address of the pages passed to (or
 allocated by) the driver are beyond 32-bit limit (which is the default
 dma mask).


 Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, 
 to reserve a big memory
 for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
 not use the swiotlb until
 the memory out of mask or swiotlb_force is enabled.

 If I still understand uncorrectly, please inform me.

 
 Please do not use CMA to work around the problem, but fix the underlying bug
 instead.
 
 The driver should call 'dma_set_mask_and_coherent()' with the appropriate
 dma mask, and check whether that succeeded. However, the code implementing
 dma_set_mask_and_coherent on arm64 also needs to be changed to look up
 the dma-ranges property (see of_dma_configure()), and check if the mask
 is possible.
 
   Arnd
 

The dma_pfn_offset looks only support arm32, but my platform is aarch64 and I 
check the latest kernel version, 
I think the dma-rangs still could not work for aarch64, so maybe we should add 
dma_pfn_offset for aarch64 first.

Ding

 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-19 Thread Arnd Bergmann
On Thursday 20 November 2014 10:57:53 Ding Tianhong wrote:
 On 2014/11/19 16:45, Arnd Bergmann wrote:
  On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
  On 2014/11/18 2:09, Catalin Marinas wrote:
  On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
  Thanks everyone, I think I found the way to fix it, need to enable 
  DMA_CMA, to reserve a big memory
  for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will 
  not use the swiotlb until
  the memory out of mask or swiotlb_force is enabled.
 
  If I still understand uncorrectly, please inform me.
 
  
  Please do not use CMA to work around the problem, but fix the underlying bug
  instead.
  
  The driver should call 'dma_set_mask_and_coherent()' with the appropriate
  dma mask, and check whether that succeeded. However, the code implementing
  dma_set_mask_and_coherent on arm64 also needs to be changed to look up
  the dma-ranges property (see of_dma_configure()), and check if the mask
  is possible.
  
 The dma_pfn_offset looks only support arm32, but my platform is aarch64 and I 
 check the latest kernel version, 
 I think the dma-rangs still could not work for aarch64, so maybe we should 
 add dma_pfn_offset for aarch64 first.
 

I didn't mean the dma_pfn_offset. The problem is that the of_dma_configure
code currently doesn't look at the mask. As I explained in my reply to
Catalin, it should set the mask to the size of the dma-ranges if that is
32-bit or smaller, and dma_set_mask should look at the same dma-ranges
property to decide what to set the mask to when a driver asks for a
mask larger than 64-bit.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-18 Thread Ding Tianhong
On 2014/11/18 2:09, Catalin Marinas wrote:
> On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
>> On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
>>> The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
>>> swiotlb late initialisation)
>>> switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), 
>>> this will occur a problem
>>> when I run the scsi stress tests, the message as below:
>>>
>>> sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
>>> bytes)..
>>> DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
>>>
>>> The reason is that the swiotlb_tlb_late_init_with_default_size() could only 
>>> alloc 16M memory for DMA-mapping,
>>> and the param in cmdline "swiotlb=xxx" is useless because the 
>>> get_free_pages() only use the buddy to assigned a
>>> maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
>>> too small in many scenes, but
>>> the swiotlb_init() which could reserved a bigger memory as wished could 
>>> work well for most drivers.
>>>
>>> I could not get a better way to fix this problem except to revert this 
>>> patch, so could you please give me some
>>> advise and help me, thanks very much.
>>
>> In general, you should not need to use swiotlb for most devices, in
>> particular for high-performance devices like network or block.
>>
>> Please make sure that you have set up the dma-ranges properties in
>> your DT properly to allow 64-bit DMA if the device supports it.
> 
> That's the problem indeed, the DMA API ends up using swiotlb bounce
> buffers because the physical address of the pages passed to (or
> allocated by) the driver are beyond 32-bit limit (which is the default
> dma mask).
> 

Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, to 
reserve a big memory
for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will not 
use the swiotlb until
the memory out of mask or swiotlb_force is enabled.

If I still understand uncorrectly, please inform me.

Regards
Ding

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-18 Thread Ding Tianhong
On 2014/11/18 2:09, Catalin Marinas wrote:
 On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
 On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
 The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
 swiotlb late initialisation)
 switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), 
 this will occur a problem
 when I run the scsi stress tests, the message as below:

 sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
 bytes)..
 DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas

 The reason is that the swiotlb_tlb_late_init_with_default_size() could only 
 alloc 16M memory for DMA-mapping,
 and the param in cmdline swiotlb=xxx is useless because the 
 get_free_pages() only use the buddy to assigned a
 maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
 too small in many scenes, but
 the swiotlb_init() which could reserved a bigger memory as wished could 
 work well for most drivers.

 I could not get a better way to fix this problem except to revert this 
 patch, so could you please give me some
 advise and help me, thanks very much.

 In general, you should not need to use swiotlb for most devices, in
 particular for high-performance devices like network or block.

 Please make sure that you have set up the dma-ranges properties in
 your DT properly to allow 64-bit DMA if the device supports it.
 
 That's the problem indeed, the DMA API ends up using swiotlb bounce
 buffers because the physical address of the pages passed to (or
 allocated by) the driver are beyond 32-bit limit (which is the default
 dma mask).
 

Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, to 
reserve a big memory
for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will not 
use the swiotlb until
the memory out of mask or swiotlb_force is enabled.

If I still understand uncorrectly, please inform me.

Regards
Ding

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-17 Thread Catalin Marinas
On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
> On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
> > The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
> > swiotlb late initialisation)
> > switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), 
> > this will occur a problem
> > when I run the scsi stress tests, the message as below:
> > 
> > sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
> > bytes)..
> > DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
> > 
> > The reason is that the swiotlb_tlb_late_init_with_default_size() could only 
> > alloc 16M memory for DMA-mapping,
> > and the param in cmdline "swiotlb=xxx" is useless because the 
> > get_free_pages() only use the buddy to assigned a
> > maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
> > too small in many scenes, but
> > the swiotlb_init() which could reserved a bigger memory as wished could 
> > work well for most drivers.
> > 
> > I could not get a better way to fix this problem except to revert this 
> > patch, so could you please give me some
> > advise and help me, thanks very much.
> 
> In general, you should not need to use swiotlb for most devices, in
> particular for high-performance devices like network or block.
> 
> Please make sure that you have set up the dma-ranges properties in
> your DT properly to allow 64-bit DMA if the device supports it.

That's the problem indeed, the DMA API ends up using swiotlb bounce
buffers because the physical address of the pages passed to (or
allocated by) the driver are beyond 32-bit limit (which is the default
dma mask).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-17 Thread Arnd Bergmann
On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
> Hi Catalin:
> The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
> swiotlb late initialisation)
> switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), 
> this will occur a problem
> when I run the scsi stress tests, the message as below:
> 
> sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
> bytes)..
> DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
> 
> The reason is that the swiotlb_tlb_late_init_with_default_size() could only 
> alloc 16M memory for DMA-mapping,
> and the param in cmdline "swiotlb=xxx" is useless because the 
> get_free_pages() only use the buddy to assigned a
> maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is too 
> small in many scenes, but
> the swiotlb_init() which could reserved a bigger memory as wished could work 
> well for most drivers.
> 
> I could not get a better way to fix this problem except to revert this patch, 
> so could you please give me some
> advise and help me, thanks very much.

In general, you should not need to use swiotlb for most devices, in
particular for high-performance devices like network or block.

Please make sure that you have set up the dma-ranges properties in
your DT properly to allow 64-bit DMA if the device supports it.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-17 Thread Arnd Bergmann
On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
 Hi Catalin:
 The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
 swiotlb late initialisation)
 switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), 
 this will occur a problem
 when I run the scsi stress tests, the message as below:
 
 sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
 bytes)..
 DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
 
 The reason is that the swiotlb_tlb_late_init_with_default_size() could only 
 alloc 16M memory for DMA-mapping,
 and the param in cmdline swiotlb=xxx is useless because the 
 get_free_pages() only use the buddy to assigned a
 maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is too 
 small in many scenes, but
 the swiotlb_init() which could reserved a bigger memory as wished could work 
 well for most drivers.
 
 I could not get a better way to fix this problem except to revert this patch, 
 so could you please give me some
 advise and help me, thanks very much.

In general, you should not need to use swiotlb for most devices, in
particular for high-performance devices like network or block.

Please make sure that you have set up the dma-ranges properties in
your DT properly to allow 64-bit DMA if the device supports it.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the problem when using swiotlb

2014-11-17 Thread Catalin Marinas
On Mon, Nov 17, 2014 at 12:18:42PM +, Arnd Bergmann wrote:
 On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
  The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use 
  swiotlb late initialisation)
  switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), 
  this will occur a problem
  when I run the scsi stress tests, the message as below:
  
  sas_controller b100.sas: swiotlb buffer is full (sz: 65536 
  bytes)..
  DMA: Out of SW-IOMMU space for 65536 bytes at device b100.sas
  
  The reason is that the swiotlb_tlb_late_init_with_default_size() could only 
  alloc 16M memory for DMA-mapping,
  and the param in cmdline swiotlb=xxx is useless because the 
  get_free_pages() only use the buddy to assigned a
  maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is 
  too small in many scenes, but
  the swiotlb_init() which could reserved a bigger memory as wished could 
  work well for most drivers.
  
  I could not get a better way to fix this problem except to revert this 
  patch, so could you please give me some
  advise and help me, thanks very much.
 
 In general, you should not need to use swiotlb for most devices, in
 particular for high-performance devices like network or block.
 
 Please make sure that you have set up the dma-ranges properties in
 your DT properly to allow 64-bit DMA if the device supports it.

That's the problem indeed, the DMA API ends up using swiotlb bounce
buffers because the physical address of the pages passed to (or
allocated by) the driver are beyond 32-bit limit (which is the default
dma mask).

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/