subject:"Re\: \[PATCH v9 0\/10\] iommu\/vt\-d\: Fix intel vt\-d faults in kdump kernel"

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-11 Thread Dave Young

On 05/12/15 at 01:57pm, Dave Young wrote:
> On 05/11/15 at 12:11pm, Joerg Roedel wrote:
> > On Thu, May 07, 2015 at 09:56:00PM +0800, Dave Young wrote:
> > > Joreg, I can not find the last reply from you, so just reply here about
> > > my worries here.
> > > 
> > > I said that the patchset will cause more problems, let me explain about
> > > it more here:
> > > 
> > > Suppose page table was corrupted, ie. original mapping iova1 -> page 1
> > > it was changed to iova1 -> page 2 accidently while crash happening,
> > > thus future dma will read/write page 2 instead page 1, right?
> > 
> > When the page-table is corrupted then it is a left-over from the old
> > kernel. When the kdump kernel boots the situation can at least not get
> > worse. For the page tables it is also hard to detect wrong mappings (if
> > this would be possible the hardware could already do it), so any checks
> > we could do there are of limited use anyway.
> 
> Joerg, since both of you guys do not think it is a problem I will object it 

s/will object/will not object

> any more though I still do not like reusing the old page tables. So let's
> leave it as a future issue.
> 
> Thanks
> Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-11 Thread Dave Young

On 05/11/15 at 12:11pm, Joerg Roedel wrote:
> On Thu, May 07, 2015 at 09:56:00PM +0800, Dave Young wrote:
> > Joreg, I can not find the last reply from you, so just reply here about
> > my worries here.
> > 
> > I said that the patchset will cause more problems, let me explain about
> > it more here:
> > 
> > Suppose page table was corrupted, ie. original mapping iova1 -> page 1
> > it was changed to iova1 -> page 2 accidently while crash happening,
> > thus future dma will read/write page 2 instead page 1, right?
> 
> When the page-table is corrupted then it is a left-over from the old
> kernel. When the kdump kernel boots the situation can at least not get
> worse. For the page tables it is also hard to detect wrong mappings (if
> this would be possible the hardware could already do it), so any checks
> we could do there are of limited use anyway.

Joerg, since both of you guys do not think it is a problem I will object it 
any more though I still do not like reusing the old page tables. So let's
leave it as a future issue.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-11 Thread Joerg Roedel

On Thu, May 07, 2015 at 09:56:00PM +0800, Dave Young wrote:
> Joreg, I can not find the last reply from you, so just reply here about
> my worries here.
> 
> I said that the patchset will cause more problems, let me explain about
> it more here:
> 
> Suppose page table was corrupted, ie. original mapping iova1 -> page 1
> it was changed to iova1 -> page 2 accidently while crash happening,
> thus future dma will read/write page 2 instead page 1, right?

When the page-table is corrupted then it is a left-over from the old
kernel. When the kdump kernel boots the situation can at least not get
worse. For the page tables it is also hard to detect wrong mappings (if
this would be possible the hardware could already do it), so any checks
we could do there are of limited use anyway.

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-10 Thread Don Dutile


On 05/07/2015 09:21 PM, Dave Young wrote:

On 05/07/15 at 10:25am, Don Dutile wrote:

On 05/07/2015 10:00 AM, Dave Young wrote:

On 04/07/15 at 10:12am, Don Dutile wrote:

On 04/06/2015 11:46 PM, Dave Young wrote:

On 04/05/15 at 09:54am, Baoquan He wrote:

On 04/03/15 at 05:21pm, Dave Young wrote:

On 04/03/15 at 05:01pm, Li, ZhenHua wrote:

Hi Dave,

There may be some possibilities that the old iommu data is corrupted by
some other modules. Currently we do not have a better solution for the
dmar faults.

But I think when this happens, we need to fix the module that corrupted
the old iommu data. I once met a similar problem in normal kernel, the
queue used by the qi_* functions was written again by another module.
The fix was in that module, not in iommu module.


It is too late, there will be no chance to save vmcore then.

Also if it is possible to continue corrupt other area of oldmem because
of using old iommu tables then it will cause more problems.

So I think the tables at least need some verifycation before being used.



Yes, it's a good thinking anout this and verification is also an
interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
and then verify this again when panic happens in purgatory. This checks
whether any code stomps into region reserved for kexec/kernel and corrupt
the loaded kernel.

If this is decided to do it should be an enhancement to current
patchset but not a approach change. Since this patchset is going very
close to point as maintainers expected maybe this can be merged firstly,
then think about enhancement. After all without this patchset vt-d often
raised error message, hung.


It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.

Thanks
Dave


The (only) issue is not corruption, but once the iommu is re-configured, the 
old,
not-stopped-yet, dma engines will use iova's that will generate dmar faults, 
which
will be enabled when the iommu is re-configured (even to a single/simple paging 
scheme)
in the kexec kernel.



Don, so if iommu is not reconfigured then these faults will not happen?


Well, if iommu is not reconfigured, then if the crash isn't caused by
an IOMMU fault (some systems have firmware-first catch the IOMMU fault & convert
them into NMI_IOCK), then the DMA's will continue into the old kernel memory 
space.


So NMI_IOCK is one reason to cause kernel hang, I think I'm still not clear 
about
what does re-configured means though. DMAR faults will happen originally this 
is the old
behavior but we are removing the faults by alowing DMA continuing into old 
memory
space.


A flood of faults occur when the 2nd kernel (re-)configures the IOMMU because
the second kernel effectively clears/disable all DMA except RMRRs, so any DMA 
from 1st kernel will flood
the system with faults.  Its the flood of dmar faults that eventually wedges 
&/or crashes the system
while trying to take a kdump.




Baoquan and me has a confusion below today about iommu=off/intel_iommu=off:

intel_iommu_init()
{
...

dmar_table_init();

disable active iommu translations;

if (no_iommu || dmar_disabled)
 goto out_free_dmar;

...
}

Any reason not move no_iommu check to the begining of intel_iommu_init function?


What does that do/help?


Just do not know why the previous handling is necessary with iommu=off, 
shouldn't
we do noting and return earlier?

Also there is a guess, dmar faults appears after iommu_init, so not sure if the 
codes
before dmar_disabled checking have some effect about enabling the faults 
messages.

Thanks
Dave



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-10 Thread Don Dutile


On 05/07/2015 10:00 AM, Dave Young wrote:

On 04/07/15 at 10:12am, Don Dutile wrote:

On 04/06/2015 11:46 PM, Dave Young wrote:

On 04/05/15 at 09:54am, Baoquan He wrote:

On 04/03/15 at 05:21pm, Dave Young wrote:

On 04/03/15 at 05:01pm, Li, ZhenHua wrote:

Hi Dave,

There may be some possibilities that the old iommu data is corrupted by
some other modules. Currently we do not have a better solution for the
dmar faults.

But I think when this happens, we need to fix the module that corrupted
the old iommu data. I once met a similar problem in normal kernel, the
queue used by the qi_* functions was written again by another module.
The fix was in that module, not in iommu module.


It is too late, there will be no chance to save vmcore then.

Also if it is possible to continue corrupt other area of oldmem because
of using old iommu tables then it will cause more problems.

So I think the tables at least need some verifycation before being used.



Yes, it's a good thinking anout this and verification is also an
interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
and then verify this again when panic happens in purgatory. This checks
whether any code stomps into region reserved for kexec/kernel and corrupt
the loaded kernel.

If this is decided to do it should be an enhancement to current
patchset but not a approach change. Since this patchset is going very
close to point as maintainers expected maybe this can be merged firstly,
then think about enhancement. After all without this patchset vt-d often
raised error message, hung.


It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.

Thanks
Dave


The (only) issue is not corruption, but once the iommu is re-configured, the 
old,
not-stopped-yet, dma engines will use iova's that will generate dmar faults, 
which
will be enabled when the iommu is re-configured (even to a single/simple paging 
scheme)
in the kexec kernel.



Don, so if iommu is not reconfigured then these faults will not happen?

Baoquan and me has a confusion below today about iommu=off/intel_iommu=off:

intel_iommu_init()
{
...

dmar_table_init();

disable active iommu translations;

if (no_iommu || dmar_disabled)
 goto out_free_dmar;

...
}

Any reason not move no_iommu check to the begining of intel_iommu_init function?

Thanks
Dave

Looks like you could.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-07 Thread Dave Young

On 05/07/15 at 10:25am, Don Dutile wrote:
> On 05/07/2015 10:00 AM, Dave Young wrote:
> >On 04/07/15 at 10:12am, Don Dutile wrote:
> >>On 04/06/2015 11:46 PM, Dave Young wrote:
> >>>On 04/05/15 at 09:54am, Baoquan He wrote:
> On 04/03/15 at 05:21pm, Dave Young wrote:
> >On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> >>Hi Dave,
> >>
> >>There may be some possibilities that the old iommu data is corrupted by
> >>some other modules. Currently we do not have a better solution for the
> >>dmar faults.
> >>
> >>But I think when this happens, we need to fix the module that corrupted
> >>the old iommu data. I once met a similar problem in normal kernel, the
> >>queue used by the qi_* functions was written again by another module.
> >>The fix was in that module, not in iommu module.
> >
> >It is too late, there will be no chance to save vmcore then.
> >
> >Also if it is possible to continue corrupt other area of oldmem because
> >of using old iommu tables then it will cause more problems.
> >
> >So I think the tables at least need some verifycation before being used.
> >
> 
> Yes, it's a good thinking anout this and verification is also an
> interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> and then verify this again when panic happens in purgatory. This checks
> whether any code stomps into region reserved for kexec/kernel and corrupt
> the loaded kernel.
> 
> If this is decided to do it should be an enhancement to current
> patchset but not a approach change. Since this patchset is going very
> close to point as maintainers expected maybe this can be merged firstly,
> then think about enhancement. After all without this patchset vt-d often
> raised error message, hung.
> >>>
> >>>It does not convince me, we should do it right at the beginning instead of
> >>>introduce something wrong.
> >>>
> >>>I wonder why the old dma can not be remap to a specific page in kdump 
> >>>kernel
> >>>so that it will not corrupt more memory. But I may missed something, I will
> >>>looking for old threads and catch up.
> >>>
> >>>Thanks
> >>>Dave
> >>>
> >>The (only) issue is not corruption, but once the iommu is re-configured, 
> >>the old,
> >>not-stopped-yet, dma engines will use iova's that will generate dmar 
> >>faults, which
> >>will be enabled when the iommu is re-configured (even to a single/simple 
> >>paging scheme)
> >>in the kexec kernel.
> >>
> >
> >Don, so if iommu is not reconfigured then these faults will not happen?
> >
> Well, if iommu is not reconfigured, then if the crash isn't caused by
> an IOMMU fault (some systems have firmware-first catch the IOMMU fault & 
> convert
> them into NMI_IOCK), then the DMA's will continue into the old kernel memory 
> space.

So NMI_IOCK is one reason to cause kernel hang, I think I'm still not clear 
about
what does re-configured means though. DMAR faults will happen originally this 
is the old
behavior but we are removing the faults by alowing DMA continuing into old 
memory
space.

> 
> >Baoquan and me has a confusion below today about iommu=off/intel_iommu=off:
> >
> >intel_iommu_init()
> >{
> >...
> >
> >dmar_table_init();
> >
> >disable active iommu translations;
> >
> >if (no_iommu || dmar_disabled)
> > goto out_free_dmar;
> >
> >...
> >}
> >
> >Any reason not move no_iommu check to the begining of intel_iommu_init 
> >function?
> >
> What does that do/help?

Just do not know why the previous handling is necessary with iommu=off, 
shouldn't
we do noting and return earlier?

Also there is a guess, dmar faults appears after iommu_init, so not sure if the 
codes
before dmar_disabled checking have some effect about enabling the faults 
messages.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-07 Thread Don Dutile


On 05/07/2015 10:00 AM, Dave Young wrote:

On 04/07/15 at 10:12am, Don Dutile wrote:

On 04/06/2015 11:46 PM, Dave Young wrote:

On 04/05/15 at 09:54am, Baoquan He wrote:

On 04/03/15 at 05:21pm, Dave Young wrote:

On 04/03/15 at 05:01pm, Li, ZhenHua wrote:

Hi Dave,

There may be some possibilities that the old iommu data is corrupted by
some other modules. Currently we do not have a better solution for the
dmar faults.

But I think when this happens, we need to fix the module that corrupted
the old iommu data. I once met a similar problem in normal kernel, the
queue used by the qi_* functions was written again by another module.
The fix was in that module, not in iommu module.


It is too late, there will be no chance to save vmcore then.

Also if it is possible to continue corrupt other area of oldmem because
of using old iommu tables then it will cause more problems.

So I think the tables at least need some verifycation before being used.



Yes, it's a good thinking anout this and verification is also an
interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
and then verify this again when panic happens in purgatory. This checks
whether any code stomps into region reserved for kexec/kernel and corrupt
the loaded kernel.

If this is decided to do it should be an enhancement to current
patchset but not a approach change. Since this patchset is going very
close to point as maintainers expected maybe this can be merged firstly,
then think about enhancement. After all without this patchset vt-d often
raised error message, hung.


It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.

Thanks
Dave


The (only) issue is not corruption, but once the iommu is re-configured, the 
old,
not-stopped-yet, dma engines will use iova's that will generate dmar faults, 
which
will be enabled when the iommu is re-configured (even to a single/simple paging 
scheme)
in the kexec kernel.



Don, so if iommu is not reconfigured then these faults will not happen?


Well, if iommu is not reconfigured, then if the crash isn't caused by
an IOMMU fault (some systems have firmware-first catch the IOMMU fault & convert
them into NMI_IOCK), then the DMA's will continue into the old kernel memory 
space.


Baoquan and me has a confusion below today about iommu=off/intel_iommu=off:

intel_iommu_init()
{
...

dmar_table_init();

disable active iommu translations;

if (no_iommu || dmar_disabled)
 goto out_free_dmar;

...
}

Any reason not move no_iommu check to the begining of intel_iommu_init function?


What does that do/help?


Thanks
Dave



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-07 Thread Dave Young

On 04/07/15 at 10:12am, Don Dutile wrote:
> On 04/06/2015 11:46 PM, Dave Young wrote:
> >On 04/05/15 at 09:54am, Baoquan He wrote:
> >>On 04/03/15 at 05:21pm, Dave Young wrote:
> >>>On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> Hi Dave,
> 
> There may be some possibilities that the old iommu data is corrupted by
> some other modules. Currently we do not have a better solution for the
> dmar faults.
> 
> But I think when this happens, we need to fix the module that corrupted
> the old iommu data. I once met a similar problem in normal kernel, the
> queue used by the qi_* functions was written again by another module.
> The fix was in that module, not in iommu module.
> >>>
> >>>It is too late, there will be no chance to save vmcore then.
> >>>
> >>>Also if it is possible to continue corrupt other area of oldmem because
> >>>of using old iommu tables then it will cause more problems.
> >>>
> >>>So I think the tables at least need some verifycation before being used.
> >>>
> >>
> >>Yes, it's a good thinking anout this and verification is also an
> >>interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> >>and then verify this again when panic happens in purgatory. This checks
> >>whether any code stomps into region reserved for kexec/kernel and corrupt
> >>the loaded kernel.
> >>
> >>If this is decided to do it should be an enhancement to current
> >>patchset but not a approach change. Since this patchset is going very
> >>close to point as maintainers expected maybe this can be merged firstly,
> >>then think about enhancement. After all without this patchset vt-d often
> >>raised error message, hung.
> >
> >It does not convince me, we should do it right at the beginning instead of
> >introduce something wrong.
> >
> >I wonder why the old dma can not be remap to a specific page in kdump kernel
> >so that it will not corrupt more memory. But I may missed something, I will
> >looking for old threads and catch up.
> >
> >Thanks
> >Dave
> >
> The (only) issue is not corruption, but once the iommu is re-configured, the 
> old,
> not-stopped-yet, dma engines will use iova's that will generate dmar faults, 
> which
> will be enabled when the iommu is re-configured (even to a single/simple 
> paging scheme)
> in the kexec kernel.
> 

Don, so if iommu is not reconfigured then these faults will not happen?

Baoquan and me has a confusion below today about iommu=off/intel_iommu=off:

intel_iommu_init()
{
...

   dmar_table_init();

   disable active iommu translations;

   if (no_iommu || dmar_disabled)
goto out_free_dmar;

...
}

Any reason not move no_iommu check to the begining of intel_iommu_init function?

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-07 Thread Dave Young

On 05/04/15 at 01:05pm, Joerg Roedel wrote:
> On Fri, Apr 03, 2015 at 04:40:31PM +0800, Dave Young wrote:
> > Have not read all the patches, but I have a question, not sure this
> > has been answered before. Old memory is not reliable, what if the old
> > memory get corrupted before panic? Is it safe to continue using it in
> > 2nd kernel, I worry that it will cause problems.
> 
> Yes, the old memory could be corrupted, and there are more failure cases
> left which we have no way of handling yet (if iommu data structures are
> in kdump backup areas).
> 
> The question is what to do if we find some of the old data structures
> corrupted, hand how far should the tests go. Should we also check the
> page-tables, for example? I think if some of the data structures for a
> device are corrupted it probably already failed in the old kernel and
> things won't get worse in the new one.

Joreg, I can not find the last reply from you, so just reply here about
my worries here.

I said that the patchset will cause more problems, let me explain about
it more here:

Suppose page table was corrupted, ie. original mapping iova1 -> page 1
it was changed to iova1 -> page 2 accidently while crash happening,
thus future dma will read/write page 2 instead page 1, right?

so the behavior changes like:
originally, dmar faults happen, but kdump kernel may boot ok with these
faults, and vmcore can be saved.
with the patchset, dmar faults does not happen, dma translation will be
handled, but dma write could corrupt unrelevant memory.

This might be corner case, but who knows because kernel paniced we can
not assume old page table is right.

But seems you all think it is safe, but let us understand each other
first then go to a solution.

Today we talked with Zhenhua about the problem I think both of us are clear
about the problems. Just he think it can be left as future work.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-07 Thread Dave Young

On 05/06/15 at 10:16am, Joerg Roedel wrote:
> On Wed, May 06, 2015 at 09:46:49AM +0800, Dave Young wrote:
> > For the original problem, the key issue is dmar faults cause kdump kernel
> > hang so that vmcore can not be saved. I do not know the reason why it hangs
> > I think it is acceptable if kdump kernel boot ok with some dma errors..
> 
> It hangs because some devices can't handle the DMAR faults and the kdump
> kernel can't initialize them and will hang itself. For that it doesn't
> matter whether the fault was caused by a read or write request.

Ok, thanks for explanation. so it explains sometimes kdump kernel boot ok
with faults, sometimes it hangs instead.

Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-06 Thread Joerg Roedel

On Wed, May 06, 2015 at 09:46:49AM +0800, Dave Young wrote:
> For the original problem, the key issue is dmar faults cause kdump kernel
> hang so that vmcore can not be saved. I do not know the reason why it hangs
> I think it is acceptable if kdump kernel boot ok with some dma errors..

It hangs because some devices can't handle the DMAR faults and the kdump
kernel can't initialize them and will hang itself. For that it doesn't
matter whether the fault was caused by a read or write request.

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-05 Thread Dave Young

On 05/05/15 at 05:23pm, Joerg Roedel wrote:
> On Tue, May 05, 2015 at 02:09:31PM +0800, Dave Young wrote:
> > I agree that we can do nothing with the old corrupted data, but I worry
> > about the future corruption after using the old corrupted data. I wonder
> > if we can mark all the oldmem as readonly so that we can lower the risk.
> > Is it resonable?
> 
> Do you mean marking it read-only for the devices? That will very likely
> cause DMAR faults, re-introducing the problem this patch-set tries to
> fix.

I means to block all dma write to oldmem, I believe it will cause DMA error.
But all other DMA reading requests will continue and work. This will avoid
future possible corruption. It will solve half of the problem at least?

For the original problem, the key issue is dmar faults cause kdump kernel
hang so that vmcore can not be saved. I do not know the reason why it hangs
I think it is acceptable if kdump kernel boot ok with some dma errors..

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-05 Thread Joerg Roedel

On Tue, May 05, 2015 at 02:09:31PM +0800, Dave Young wrote:
> I agree that we can do nothing with the old corrupted data, but I worry
> about the future corruption after using the old corrupted data. I wonder
> if we can mark all the oldmem as readonly so that we can lower the risk.
> Is it resonable?

Do you mean marking it read-only for the devices? That will very likely
cause DMAR faults, re-introducing the problem this patch-set tries to
fix.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-04 Thread Dave Young

On 05/04/15 at 01:05pm, Joerg Roedel wrote:
> On Fri, Apr 03, 2015 at 04:40:31PM +0800, Dave Young wrote:
> > Have not read all the patches, but I have a question, not sure this
> > has been answered before. Old memory is not reliable, what if the old
> > memory get corrupted before panic? Is it safe to continue using it in
> > 2nd kernel, I worry that it will cause problems.
> 
> Yes, the old memory could be corrupted, and there are more failure cases
> left which we have no way of handling yet (if iommu data structures are
> in kdump backup areas).
> 
> The question is what to do if we find some of the old data structures
> corrupted, hand how far should the tests go. Should we also check the
> page-tables, for example? I think if some of the data structures for a
> device are corrupted it probably already failed in the old kernel and
> things won't get worse in the new one.

I agree that we can do nothing with the old corrupted data, but I worry
about the future corruption after using the old corrupted data. I wonder
if we can mark all the oldmem as readonly so that we can lower the risk.
Is it resonable?

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-04 Thread Don Dutile


On 05/04/2015 07:05 AM, Joerg Roedel wrote:

On Fri, Apr 03, 2015 at 04:40:31PM +0800, Dave Young wrote:

Have not read all the patches, but I have a question, not sure this
has been answered before. Old memory is not reliable, what if the old
memory get corrupted before panic? Is it safe to continue using it in
2nd kernel, I worry that it will cause problems.


Yes, the old memory could be corrupted, and there are more failure cases
left which we have no way of handling yet (if iommu data structures are
in kdump backup areas).

The question is what to do if we find some of the old data structures
corrupted, hand how far should the tests go. Should we also check the
page-tables, for example? I think if some of the data structures for a
device are corrupted it probably already failed in the old kernel and
things won't get worse in the new one.

So checking is not strictly necessary in the first version of these
patches (unless we find a valid failure scenario). Once we have some
good plan on what to do if we find corruption, we can add checking of
course.


Regards,

Joerg



Agreed.  This is a significant improvement over what we (don') have.

Corruption related to IOMMU must occur within the host, and it must be
a software corruption, b/c the IOMMU inherently protects itself by protecting
all of memory from errant DMAs.  Therefore, if the only IOMMU corruptor is
in the host, it's likely the entire host kernel crash dump will either be
useless, or corrupted by the security breach, at which point,
this is just another scenario of a failed crash dump that will never be taken.

The kernel can't protect the mapping tables, which are the most likely area to
be corrupted, b/c it'd (minimally) have to be per-device (to avoid locking
& coherency issues), and would require significant
overhead to keep/update a checksum-like scheme on (potentially) 4 levels of 
page tables.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-04 Thread Joerg Roedel

On Fri, Apr 03, 2015 at 04:40:31PM +0800, Dave Young wrote:
> Have not read all the patches, but I have a question, not sure this
> has been answered before. Old memory is not reliable, what if the old
> memory get corrupted before panic? Is it safe to continue using it in
> 2nd kernel, I worry that it will cause problems.

Yes, the old memory could be corrupted, and there are more failure cases
left which we have no way of handling yet (if iommu data structures are
in kdump backup areas).

The question is what to do if we find some of the old data structures
corrupted, hand how far should the tests go. Should we also check the
page-tables, for example? I think if some of the data structures for a
device are corrupted it probably already failed in the old kernel and
things won't get worse in the new one.

So checking is not strictly necessary in the first version of these
patches (unless we find a valid failure scenario). Once we have some
good plan on what to do if we find corruption, we can add checking of
course.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-07 Thread Dave Young

On 04/07/15 at 05:55pm, Li, ZhenHua wrote:
> On 04/07/2015 05:08 PM, Dave Young wrote:
> >On 04/07/15 at 11:46am, Dave Young wrote:
> >>On 04/05/15 at 09:54am, Baoquan He wrote:
> >>>On 04/03/15 at 05:21pm, Dave Young wrote:
> On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> >Hi Dave,
> >
> >There may be some possibilities that the old iommu data is corrupted by
> >some other modules. Currently we do not have a better solution for the
> >dmar faults.
> >
> >But I think when this happens, we need to fix the module that corrupted
> >the old iommu data. I once met a similar problem in normal kernel, the
> >queue used by the qi_* functions was written again by another module.
> >The fix was in that module, not in iommu module.
> 
> It is too late, there will be no chance to save vmcore then.
> 
> Also if it is possible to continue corrupt other area of oldmem because
> of using old iommu tables then it will cause more problems.
> 
> So I think the tables at least need some verifycation before being used.
> 
> >>>
> >>>Yes, it's a good thinking anout this and verification is also an
> >>>interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> >>>and then verify this again when panic happens in purgatory. This checks
> >>>whether any code stomps into region reserved for kexec/kernel and corrupt
> >>>the loaded kernel.
> >>>
> >>>If this is decided to do it should be an enhancement to current
> >>>patchset but not a approach change. Since this patchset is going very
> >>>close to point as maintainers expected maybe this can be merged firstly,
> >>>then think about enhancement. After all without this patchset vt-d often
> >>>raised error message, hung.
> >>
> >>It does not convince me, we should do it right at the beginning instead of
> >>introduce something wrong.
> >>
> >>I wonder why the old dma can not be remap to a specific page in kdump kernel
> >>so that it will not corrupt more memory. But I may missed something, I will
> >>looking for old threads and catch up.
> >
> >I have read the old discussion, above way was dropped because it could 
> >corrupt
> >filesystem. Apologize about late commenting.
> >
> >But current solution sounds bad to me because of using old memory which is 
> >not
> >reliable.
> >
> >Thanks
> >Dave
> >
> Seems we do not have a better solution for the dmar faults.  But I believe
> we can find out how to verify the iommu data which is located in old memory.

That will be great, thanks.

So there's two things:
1) make sure old pg tables are right, this is what we were talking about.
2) avoid writing old memory, I suppose only dma read could corrupt filesystem,
right? So how about for any dma writes just create a scratch page in 2nd kernel
memory. Only using old page table for dma read.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-07 Thread Don Dutile


On 04/06/2015 11:46 PM, Dave Young wrote:

On 04/05/15 at 09:54am, Baoquan He wrote:

On 04/03/15 at 05:21pm, Dave Young wrote:

On 04/03/15 at 05:01pm, Li, ZhenHua wrote:

Hi Dave,

There may be some possibilities that the old iommu data is corrupted by
some other modules. Currently we do not have a better solution for the
dmar faults.

But I think when this happens, we need to fix the module that corrupted
the old iommu data. I once met a similar problem in normal kernel, the
queue used by the qi_* functions was written again by another module.
The fix was in that module, not in iommu module.


It is too late, there will be no chance to save vmcore then.

Also if it is possible to continue corrupt other area of oldmem because
of using old iommu tables then it will cause more problems.

So I think the tables at least need some verifycation before being used.



Yes, it's a good thinking anout this and verification is also an
interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
and then verify this again when panic happens in purgatory. This checks
whether any code stomps into region reserved for kexec/kernel and corrupt
the loaded kernel.

If this is decided to do it should be an enhancement to current
patchset but not a approach change. Since this patchset is going very
close to point as maintainers expected maybe this can be merged firstly,
then think about enhancement. After all without this patchset vt-d often
raised error message, hung.


It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.

Thanks
Dave


The (only) issue is not corruption, but once the iommu is re-configured, the 
old,
not-stopped-yet, dma engines will use iova's that will generate dmar faults, 
which
will be enabled when the iommu is re-configured (even to a single/simple paging 
scheme)
in the kexec kernel.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-07 Thread Li, ZhenHua


On 04/07/2015 05:08 PM, Dave Young wrote:

On 04/07/15 at 11:46am, Dave Young wrote:

On 04/05/15 at 09:54am, Baoquan He wrote:

On 04/03/15 at 05:21pm, Dave Young wrote:

On 04/03/15 at 05:01pm, Li, ZhenHua wrote:

Hi Dave,

There may be some possibilities that the old iommu data is corrupted by
some other modules. Currently we do not have a better solution for the
dmar faults.

But I think when this happens, we need to fix the module that corrupted
the old iommu data. I once met a similar problem in normal kernel, the
queue used by the qi_* functions was written again by another module.
The fix was in that module, not in iommu module.


It is too late, there will be no chance to save vmcore then.

Also if it is possible to continue corrupt other area of oldmem because
of using old iommu tables then it will cause more problems.

So I think the tables at least need some verifycation before being used.



Yes, it's a good thinking anout this and verification is also an
interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
and then verify this again when panic happens in purgatory. This checks
whether any code stomps into region reserved for kexec/kernel and corrupt
the loaded kernel.

If this is decided to do it should be an enhancement to current
patchset but not a approach change. Since this patchset is going very
close to point as maintainers expected maybe this can be merged firstly,
then think about enhancement. After all without this patchset vt-d often
raised error message, hung.


It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.


I have read the old discussion, above way was dropped because it could corrupt
filesystem. Apologize about late commenting.

But current solution sounds bad to me because of using old memory which is not
reliable.

Thanks
Dave

Seems we do not have a better solution for the dmar faults.  But I 
believe we can find out how to verify the iommu data which is located in 
old memory.


Thanks
Zhenhua

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-07 Thread Dave Young

On 04/07/15 at 11:46am, Dave Young wrote:
> On 04/05/15 at 09:54am, Baoquan He wrote:
> > On 04/03/15 at 05:21pm, Dave Young wrote:
> > > On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> > > > Hi Dave,
> > > > 
> > > > There may be some possibilities that the old iommu data is corrupted by
> > > > some other modules. Currently we do not have a better solution for the
> > > > dmar faults.
> > > > 
> > > > But I think when this happens, we need to fix the module that corrupted
> > > > the old iommu data. I once met a similar problem in normal kernel, the
> > > > queue used by the qi_* functions was written again by another module.
> > > > The fix was in that module, not in iommu module.
> > > 
> > > It is too late, there will be no chance to save vmcore then.
> > > 
> > > Also if it is possible to continue corrupt other area of oldmem because
> > > of using old iommu tables then it will cause more problems.
> > > 
> > > So I think the tables at least need some verifycation before being used.
> > > 
> > 
> > Yes, it's a good thinking anout this and verification is also an
> > interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> > and then verify this again when panic happens in purgatory. This checks
> > whether any code stomps into region reserved for kexec/kernel and corrupt
> > the loaded kernel.
> > 
> > If this is decided to do it should be an enhancement to current
> > patchset but not a approach change. Since this patchset is going very
> > close to point as maintainers expected maybe this can be merged firstly,
> > then think about enhancement. After all without this patchset vt-d often
> > raised error message, hung.
> 
> It does not convince me, we should do it right at the beginning instead of
> introduce something wrong.
> 
> I wonder why the old dma can not be remap to a specific page in kdump kernel
> so that it will not corrupt more memory. But I may missed something, I will
> looking for old threads and catch up.

I have read the old discussion, above way was dropped because it could corrupt
filesystem. Apologize about late commenting.

But current solution sounds bad to me because of using old memory which is not
reliable. 

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-06 Thread Dave Young

On 04/05/15 at 09:54am, Baoquan He wrote:
> On 04/03/15 at 05:21pm, Dave Young wrote:
> > On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> > > Hi Dave,
> > > 
> > > There may be some possibilities that the old iommu data is corrupted by
> > > some other modules. Currently we do not have a better solution for the
> > > dmar faults.
> > > 
> > > But I think when this happens, we need to fix the module that corrupted
> > > the old iommu data. I once met a similar problem in normal kernel, the
> > > queue used by the qi_* functions was written again by another module.
> > > The fix was in that module, not in iommu module.
> > 
> > It is too late, there will be no chance to save vmcore then.
> > 
> > Also if it is possible to continue corrupt other area of oldmem because
> > of using old iommu tables then it will cause more problems.
> > 
> > So I think the tables at least need some verifycation before being used.
> > 
> 
> Yes, it's a good thinking anout this and verification is also an
> interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> and then verify this again when panic happens in purgatory. This checks
> whether any code stomps into region reserved for kexec/kernel and corrupt
> the loaded kernel.
> 
> If this is decided to do it should be an enhancement to current
> patchset but not a approach change. Since this patchset is going very
> close to point as maintainers expected maybe this can be merged firstly,
> then think about enhancement. After all without this patchset vt-d often
> raised error message, hung.

It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-06 Thread Dave Young

On 04/03/15 at 02:05pm, Li, Zhen-Hua wrote:
> The hardware will do some verification, but not completely.  If people think 
> the OS should also do this, then it should be another patchset, I think.

If there is chance to corrupt more memory I think it is not a right way.
We should think about a better solution instead of fix it later.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-04 Thread Baoquan He

On 04/03/15 at 05:21pm, Dave Young wrote:
> On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> > Hi Dave,
> > 
> > There may be some possibilities that the old iommu data is corrupted by
> > some other modules. Currently we do not have a better solution for the
> > dmar faults.
> > 
> > But I think when this happens, we need to fix the module that corrupted
> > the old iommu data. I once met a similar problem in normal kernel, the
> > queue used by the qi_* functions was written again by another module.
> > The fix was in that module, not in iommu module.
> 
> It is too late, there will be no chance to save vmcore then.
> 
> Also if it is possible to continue corrupt other area of oldmem because
> of using old iommu tables then it will cause more problems.
> 
> So I think the tables at least need some verifycation before being used.
> 

Yes, it's a good thinking anout this and verification is also an
interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
and then verify this again when panic happens in purgatory. This checks
whether any code stomps into region reserved for kexec/kernel and corrupt
the loaded kernel.

If this is decided to do it should be an enhancement to current
patchset but not a approach change. Since this patchset is going very
close to point as maintainers expected maybe this can be merged firstly,
then think about enhancement. After all without this patchset vt-d often
raised error message, hung.

By the way I tested this patchset it works very well on my HP z420 work
station.

Thanks
Baoquan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Dave Young

On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> Hi Dave,
> 
> There may be some possibilities that the old iommu data is corrupted by
> some other modules. Currently we do not have a better solution for the
> dmar faults.
> 
> But I think when this happens, we need to fix the module that corrupted
> the old iommu data. I once met a similar problem in normal kernel, the
> queue used by the qi_* functions was written again by another module.
> The fix was in that module, not in iommu module.

It is too late, there will be no chance to save vmcore then.

Also if it is possible to continue corrupt other area of oldmem because
of using old iommu tables then it will cause more problems.

So I think the tables at least need some verifycation before being used.

> 
> 
> Thanks
> Zhenhua
> 
> On 04/03/2015 04:40 PM, Dave Young wrote:
> >>To fix this problem, we modifies the behaviors of the intel vt-d in the
> >>crashdump kernel:
> >>
> >>For DMA Remapping:
> >>1. To accept the vt-d hardware in an active state,
> >>2. Do not disable and re-enable the translation, keep it enabled.
> >>3. Use the old root entry table, do not rewrite the RTA register.
> >>4. Malloc and use new context entry table, copy data from the old ones that
> >>used by the old kernel.
> >
> >Have not read all the patches, but I have a question, not sure this has been
> >answered before. Old memory is not reliable, what if the old memory get 
> >corrupted
> >before panic? Is it safe to continue using it in 2nd kernel, I worry that it 
> >will
> >cause problems.
> >
> >Hope I'm wrong though.
> >
> >Thanks
> >Dave
> >
> >
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Dave Young

On 03/19/15 at 01:36pm, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
> when a panic happens, the kdump kernel will boot with these faults:
> 
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
> DMAR:[fault reason 01] Present bit in root entry is clear
> 
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
> 
> On some system, the interrupt remapping fault will also happen even if the 
> intel_iommu is not set to on, because the interrupt remapping will be enabled 
> when x2apic is needed by the system.
> 
> The cause of the DMA fault is described in Bill's original version, and the 
> INTR-Remap fault is caused by a similar reason. In short, the initialization 
> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong 
> response.
> 
> To fix this problem, we modifies the behaviors of the intel vt-d in the 
> crashdump kernel:
> 
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table, copy data from the old ones that
>used by the old kernel.
> 5. Keep using the old page tables before driver is loaded.
> 6. After device driver is loaded, when it issues the first dma_map command, 
>free the dmar_domain structure for this device, and generate a new one, so 
>that the device can be assigned a new and empty page table. 
> 7. When a new context entry table is generated, we also save its address to 
>the old root entry table.
> 
> For Interrupt Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
> 4. When ioapic entry is setup, the interrupt remapping table is changed, and 
>the updated data will be stored to the old interrupt remapping table.
> 
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
>for that device.
> 2. This approach behaves in a manner very similar to operation without an
>active iommu.
> 3. Any activity between the IO-device and its RMRR areas is handled by the
>device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
>This supports the practice of creating a special kdump kernel without
>drivers for any devices that are not required for taking a crashdump. 
> 5. Minimal code-changes among the existing mainline intel vt-d code.
> 
> Summary of changes in this patch set:
> 1. Added some useful function for root entry table in code intel-iommu.c
> 2. Added new members to struct root_entry and struct irte;
> 3. Functions to load old root entry table to iommu->root_entry from the 
> memory 
>of old kernel.
> 4. Functions to malloc new context entry table and copy the data from the old
>ones to the malloced new ones.
> 5. Functions to enable support for DMA remapping in kdump kernel.
> 6. Functions to load old irte data from the old kernel to the kdump kernel.
> 7. Some code changes that support other behaviours that have been listed.
> 8. In the new functions, use physical address as "unsigned long" type, not 
>pointers.
> 
> Original version by Bill Sumner:
> https://lkml.org/lkml/2014/1/10/518
> https://lkml.org/lkml/2014/4/15/716
> https://lkml.org/lkml/2014/4/24/836
> 
> Zhenhua's updates:
> https://lkml.org/lkml/2014/10/21/134
> https://lkml.org/lkml/2014/12/15/121
> https://lkml.org/lkml/2014/12/22/53
> https://lkml.org/lkml/2015/1/6/1166
> https://lkml.org/lkml/2015/1/12/35
> 
> Changelog[v9]:
> 1. Add new function iommu_attach_domain_with_id.
> 2. Do not copy old page tables, keep using the old ones.
> 3. Remove functions:
>intel_iommu_did_to_domain_values_entry
>intel_iommu_get_dids_from_old_kernel
>device_to_domain_id
>copy_page_addr
>copy_page_table
>copy_context_entry
>copy_context_entry_table
> 4. Add new function device_to_existing_context_entry.
> 
> Changelog[v8]:
> 1. Add a missing __iommu_flush_cache in function copy_page_table.
> 
> Changelog[v7]:
> 1. Use __iommu_flush_cache to flush the data to hardware.
> 
> Changelog[v6]:
> 1. Use "unsigned long" as type of physical address.
> 2. Use new function unmap_device_dma to unmap the old dma.
> 3. Some small incorrect bits order for aw shift.
> 
> Changelog[v5]:
> 1. Do not disable a

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Dave Young

> To fix this problem, we modifies the behaviors of the intel vt-d in the 
> crashdump kernel:
> 
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table, copy data from the old ones that
>used by the old kernel.

Have not read all the patches, but I have a question, not sure this has been
answered before. Old memory is not reliable, what if the old memory get 
corrupted
before panic? Is it safe to continue using it in 2nd kernel, I worry that it 
will
cause problems.

Hope I'm wrong though.

Thanks
Dave


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Li, Zhen-Hua

The hardware will do some verification, but not completely.  If people think 
the OS should also do this, then it should be another patchset, I think.

Thanks
Zhenhua 

> 在 2015年4月3日，17:21，Dave Young  写道：
> 
>> On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
>> Hi Dave,
>> 
>> There may be some possibilities that the old iommu data is corrupted by
>> some other modules. Currently we do not have a better solution for the
>> dmar faults.
>> 
>> But I think when this happens, we need to fix the module that corrupted
>> the old iommu data. I once met a similar problem in normal kernel, the
>> queue used by the qi_* functions was written again by another module.
>> The fix was in that module, not in iommu module.
> 
> It is too late, there will be no chance to save vmcore then.
> 
> Also if it is possible to continue corrupt other area of oldmem because
> of using old iommu tables then it will cause more problems.
> 
> So I think the tables at least need some verifycation before being used.
> 
>> 
>> 
>> Thanks
>> Zhenhua
>> 
>> On 04/03/2015 04:40 PM, Dave Young wrote:
 To fix this problem, we modifies the behaviors of the intel vt-d in the
 crashdump kernel:
 
 For DMA Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the translation, keep it enabled.
 3. Use the old root entry table, do not rewrite the RTA register.
 4. Malloc and use new context entry table, copy data from the old ones that
   used by the old kernel.
>>> 
>>> Have not read all the patches, but I have a question, not sure this has been
>>> answered before. Old memory is not reliable, what if the old memory get 
>>> corrupted
>>> before panic? Is it safe to continue using it in 2nd kernel, I worry that 
>>> it will
>>> cause problems.
>>> 
>>> Hope I'm wrong though.
>>> 
>>> Thanks
>>> Dave
>> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Li, ZhenHua


Hi Dave,

There may be some possibilities that the old iommu data is corrupted by
some other modules. Currently we do not have a better solution for the
dmar faults.

But I think when this happens, we need to fix the module that corrupted
the old iommu data. I once met a similar problem in normal kernel, the
queue used by the qi_* functions was written again by another module.
The fix was in that module, not in iommu module.


Thanks
Zhenhua

On 04/03/2015 04:40 PM, Dave Young wrote:

To fix this problem, we modifies the behaviors of the intel vt-d in the
crashdump kernel:

For DMA Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the translation, keep it enabled.
3. Use the old root entry table, do not rewrite the RTA register.
4. Malloc and use new context entry table, copy data from the old ones that
used by the old kernel.


Have not read all the patches, but I have a question, not sure this has been
answered before. Old memory is not reliable, what if the old memory get 
corrupted
before panic? Is it safe to continue using it in 2nd kernel, I worry that it 
will
cause problems.

Hope I'm wrong though.

Thanks
Dave




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Li, ZhenHua


On 04/03/2015 04:28 PM, Dave Young wrote:

On 03/19/15 at 01:36pm, Li, Zhen-Hua wrote:

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:

Zhenhua,

I will review the patchset recently, sorry for jumping in late.

Thanks
Dave


Hi Dave,
Thanks for your review. And please also take a look at the plan I sent 
in another mail:


Change use of
 if (is_kdump_kernel()) { }

To:
if (iommu_enabled_in_last_kernel) { }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-03 Thread Li, ZhenHua


Hi Joerg,
This is quite strange. I checked the patches from patch 01 to 10 using 
./scripts/checkpatch.pl under the kernel source directory, but got 0 
errors and 0 warning.  Only some white spaces in the cover letter 00, 
but is could not be checked by this script.


But I checked the intel-iommu.c using "checkpatch.pl -f", found too many 
warnings and errors. Maybe we need a new patch to fix them.


Thanks
Zhenhua

On 04/02/2015 07:11 PM, Joerg Roedel wrote:

Hi Zhen-Hua,

On Thu, Mar 19, 2015 at 01:36:18PM +0800, Li, Zhen-Hua wrote:

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:


I reviewed this patch-set and it is getting closer to a point where it
could be merged. I found a few white-space errors in the review, please
do a check_patch run on the next round and fix these. Besides that and
given some third-party testing and reviews I think we can look forward
to merge it early after the merge-window for v4.1, to give it enough
testing in -next too.


Joerg



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-02 Thread Joerg Roedel

Hi Zhen-Hua,

On Thu, Mar 19, 2015 at 01:36:18PM +0800, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
> when a panic happens, the kdump kernel will boot with these faults:

I reviewed this patch-set and it is getting closer to a point where it
could be merged. I found a few white-space errors in the review, please
do a check_patch run on the next round and fix these. Besides that and
given some third-party testing and reviews I think we can look forward
to merge it early after the merge-window for v4.1, to give it enough
testing in -next too.

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

31 matches

Site Navigation

Mail list logo

Footer information