On 8/27/2014 4:20 PM, Andrew Morton wrote:
> On Wed, 27 Aug 2014 16:15:28 -0700 Mike Travis <tra...@sgi.com> wrote:
> 
>>
>>>
>>>> There are two causes for requiring a restart/reload of the drivers.
>>>> First is periodic preventive maintenance (PM) and the second is if
>>>> any of the devices experience a fatal error.  Both of these trigger
>>>> this excessively long delay in bringing the system back up to full
>>>> capability.
>>>>
>>>> The problem was tracked down to a very slow IOREMAP operation and
>>>> the excessively long ioresource lookup to insure that the user is
>>>> not attempting to ioremap RAM.  These patches provide a speed up
>>>> to that function.
>>>
>>> With what result?
>>>
>>
>> Early measurements on our in house lab system (with far fewer cpus
>> and memory) shows about a 60-75% increase.  They have a 31 devices,
>> 3000+ cpus, 10+Tb of memory.  We have 20 devices, 480 cpus, ~2Tb of
>> memory.  I expect their ioresource list to be about 5-10 times longer.
>> [But their system is in production so we have to wait for the next
>> scheduled PM interval before a live test can be done.]
> 
> So you expect 1+ hours?  That's still nuts.
> 

Actually I expect a lot better improvement.  We are removing cycles
through the I/O resource list and the longer the list, the longer
it takes to pass completely through it.  As mentioned for a 128M
I/O BAR region, that is 32 passes, so we are removing 31 of them.
31 times a list 5-10 times longer should be a much better overall
improvement in the ioremap time.  The startup time of the device
will still be there, though we are encouraging the vendor to look
at starting them up in parallel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to