On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
> 
> On 6/19/2015 2:44 PM, Toshi Kani wrote:
> > __ioremap_caller() calls region_is_ram() to look up the resource
> > to check if a target range is RAM, which was added as an additinal
> > check to improve the lookup performance over page_is_ram() (commit
> > 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
> > function").
> > 
> > __ioremap_caller() then calls walk_system_ram_range(), which had
> > replaced page_is_ram() to improve the lookup performance (commit
> > c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
> > 
> > Since both functions walk through the resource table, there is
> > no need to call the two functions.  Furthermore, region_is_ram()
> > has bugs and always returns with -1.  This makes
> > walk_system_ram_range() as the only check being used.
> 
> Do you have an example of a failing case?  

Well, region_is_ram() fails with -1 for every case... This is because it
breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
the lowest address range) of the resource table.  For example, the first
entry of 'p->end' is 0xfff on my system, which is certainly smaller than
'start'.

  # cat /proc/iomem
  00000000-00000fff : reserved
  00001000-00092fff : System RAM
  00093000-00093fff : reserved
        :

> Also, I didn't know that
> IOREMAP'd addresses were allowed to be on non-page boundaries?

Yes, that is allowed.  Here is a comment in __ioremap_caller(). 

 * NOTE! We need to allow non-page-aligned mappings too: we will
obviously
 * have to convert them into an offset in a page-aligned mapping, but
the
 * caller shouldn't need to know that small detail.

> Here's the comment and reason for the patches from Patch 0:
> 
> <<<
> We have a large university system in the UK that is experiencing
> very long delays modprobing the driver for a specific I/O device.
> The delay is from 8-10 minutes per device and there are 31 devices
> in the system.  This 4 to 5 hour delay in starting up those I/O
> devices is very much a burden on the customer.
> ...
> The problem was tracked down to a very slow IOREMAP operation and
> the excessively long ioresource lookup to insure that the user is
> not attempting to ioremap RAM.  These patches provide a speed up
> to that function.
> >>>
> 
> The speed up was pretty dramatic, I think to about 15-20 minutes
> (the test was done by our local CS person in the UK).  I think this
> would prove the function was working since it would have fallen
> back to the previous page_is_ram function and the 4 to 5 hour
> startup.

I wonder how this test was conducted.  When the region_is_ram() change
got in, the kernel already had the other speed up change (c81c8a1eeede),
which had replaced page_is_ram().

> If there is a failure, it would be better for all to fix the specific
> bug and not re-introduce the original problem.  Perhaps drop to
> page is ram if the address is not page aligned?

walk_system_ram_range() is the one that has the issue with
page-unaligned address.  region_is_ram() after fixed by patch 3/3 does
not have this issue.  ioremap() does not call page_is_ram() anymore.  

pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
ioremap for setup_data, which is page-unaligned.  If we are going to
disallow such callers, they all need to be converted with a different
mapping interface, such as vmap().  But vmap() takes an array of page
pointers as an argument, and is not convenient for them to use.  Since
setup_data is a special range, if they need to be changed may be
arguable.  I think issue 3 described in patch 0/3 needs further
discussion.  For now, I'd like to fix issue 1 & 2.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to