Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-24 Thread Dave Young
On 04/24/15 at 04:35pm, Baoquan He wrote:
> On 04/24/15 at 04:25pm, Dave Young wrote:
> > Hi, Baoquan
> > 
> > > I support this patchset.
> > > 
> > > We should not fear oldmem since reserved crashkernel region is similar.
> > > No one can guarantee that any crazy code won't step into crashkernel
> > > region just because 1st kernel says it's reversed for kdump kernel. Here
> > > the root table and context tables are also not built to allow legal code
> > > to danamge. Both of them has the risk to be corrupted, for trying our
> > > best to get a dumped vmcore the risk is worth being taken.
> > 
> > old mem is mapped in 1st kernel so compare with the reserved crashkernel
> > they are more likely to be corrupted. they are totally different. 
> 
> Could you tell how and why they are different? Wrong code will choose
> root tables and context tables to danamge when they totally lose
> control?

iommu will map io address to system ram, right? not to reserved ram, but
yes I'm assuming the page table is right, but I was worrying they are corrupted
while kernel panic is happening.

> 
> > 
> > > 
> > > And the resetting pci way has been NACKed by David Woodhouse, the
> > > maintainer of intel iommu. Because the place calling the resetting pci
> > > code is ugly before kdump kernel or in kdump kernel. And as he said a
> > > certain device made mistakes why we blame on all devices. We should fix
> > > that device who made mistakes. 
> > 
> > Resetting pci bus is not ugly than fixing a problem with risk and to fix
> > the problem it introduced in the future.
> 
> There's a problem, we fix the problem. If that's uglier, I need redefine
> the 'ugly' in my personal dict. You mean the problem it could introduce
> is wrong code will damage root table and context tables, why don't we
> fix that wrong code, but blame innocent context tables? So you mean
> these tables should deserve being damaged by wrong code?

I'm more than happy to see this issue can be fixed in the patchset, I do not
agree to add the code there with such problems. OTOH, for now seems there's
no way to fix it.

> 
> > 
> > I know it is late to speak out, but sorry I still object and have to NACK 
> > this
> > oldmem approach from my point.
> > 
> > Thanks
> > Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-24 Thread Baoquan He
On 04/24/15 at 04:25pm, Dave Young wrote:
> Hi, Baoquan
> 
> > I support this patchset.
> > 
> > We should not fear oldmem since reserved crashkernel region is similar.
> > No one can guarantee that any crazy code won't step into crashkernel
> > region just because 1st kernel says it's reversed for kdump kernel. Here
> > the root table and context tables are also not built to allow legal code
> > to danamge. Both of them has the risk to be corrupted, for trying our
> > best to get a dumped vmcore the risk is worth being taken.
> 
> old mem is mapped in 1st kernel so compare with the reserved crashkernel
> they are more likely to be corrupted. they are totally different. 

Could you tell how and why they are different? Wrong code will choose
root tables and context tables to danamge when they totally lose
control?

> 
> > 
> > And the resetting pci way has been NACKed by David Woodhouse, the
> > maintainer of intel iommu. Because the place calling the resetting pci
> > code is ugly before kdump kernel or in kdump kernel. And as he said a
> > certain device made mistakes why we blame on all devices. We should fix
> > that device who made mistakes. 
> 
> Resetting pci bus is not ugly than fixing a problem with risk and to fix
> the problem it introduced in the future.

There's a problem, we fix the problem. If that's uglier, I need redefine
the 'ugly' in my personal dict. You mean the problem it could introduce
is wrong code will damage root table and context tables, why don't we
fix that wrong code, but blame innocent context tables? So you mean
these tables should deserve being damaged by wrong code?

> 
> I know it is late to speak out, but sorry I still object and have to NACK this
> oldmem approach from my point.
> 
> Thanks
> Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-24 Thread Dave Young
Hi, Baoquan

> I support this patchset.
> 
> We should not fear oldmem since reserved crashkernel region is similar.
> No one can guarantee that any crazy code won't step into crashkernel
> region just because 1st kernel says it's reversed for kdump kernel. Here
> the root table and context tables are also not built to allow legal code
> to danamge. Both of them has the risk to be corrupted, for trying our
> best to get a dumped vmcore the risk is worth being taken.

old mem is mapped in 1st kernel so compare with the reserved crashkernel
they are more likely to be corrupted. they are totally different. 

> 
> And the resetting pci way has been NACKed by David Woodhouse, the
> maintainer of intel iommu. Because the place calling the resetting pci
> code is ugly before kdump kernel or in kdump kernel. And as he said a
> certain device made mistakes why we blame on all devices. We should fix
> that device who made mistakes. 

Resetting pci bus is not ugly than fixing a problem with risk and to fix
the problem it introduced in the future.

I know it is late to speak out, but sorry I still object and have to NACK this
oldmem approach from my point.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-24 Thread Baoquan He
On 04/15/15 at 02:48pm, Dave Young wrote:
> On 04/15/15 at 01:47pm, Li, ZhenHua wrote:
> > On 04/15/2015 08:57 AM, Dave Young wrote:
> > >Again, I think it is bad to use old page table, below issues need consider:
> > >1) make sure old page table are reliable across crash
> > >2) do not allow writing oldmem after crash
> > >
> > >Please correct me if I'm wrong, or if above is not doable I think I will 
> > >vote for
> > >resetting pci bus.
> > >
> > >Thanks
> > >Dave
> > >
> > Hi Dave,
> > 
> > When updating the context tables, we have to write their address to root
> > tables, this will cause writing to old mem.
> > 
> > Resetting the pci bus has been discussed, please check this:
> > http://lists.infradead.org/pipermail/kexec/2014-October/012752.html
> > https://lkml.org/lkml/2014/10/21/890

I support this patchset.

We should not fear oldmem since reserved crashkernel region is similar.
No one can guarantee that any crazy code won't step into crashkernel
region just because 1st kernel says it's reversed for kdump kernel. Here
the root table and context tables are also not built to allow legal code
to danamge. Both of them has the risk to be corrupted, for trying our
best to get a dumped vmcore the risk is worth being taken.

And the resetting pci way has been NACKed by David Woodhouse, the
maintainer of intel iommu. Because the place calling the resetting pci
code is ugly before kdump kernel or in kdump kernel. And as he said a
certain device made mistakes why we blame on all devices. We should fix
that device who made mistakes. 

As for me, periodically poked by customers to ask how iommu fix is
going, I really think this patchset is good enough. Aren't we going to
do thing just because there's a risk with tiny possibility or not perfect
enough. I think people won't agree. Otherwise kdump could have been
killed when author proposed it since crashkernel reserved region is
risky and could be corrupted by 1st kernel.

Anyway, let's comprimise a little. At worst it can be reverted if it's
not satisfactory.

Personal opinion.

By the way, I tested it and it works well on my HP z420 workstation.

Thanks
Baoquan


> 
> I know one reason to use old pgtable is this looks better because it fixes the
> real problem, but it is not a good way if it introduce more problems because 
> of
> it have to use oldmem. I will be glad if this is not a problem but I have not
> been convinced.
> 
> OTOH, there's many types of iommu, intel, amd, a lot of other types. They need
> their own fixes, so it looks not that elegant.
> 
> For pci reset, it is not perfect, but it has another advantage, the patch is
> simpler. The problem I see from the old discusssion is, reset bus in 2nd 
> kernel
> is acceptable but it does not fix things on sparc platform. AFAIK current 
> reported
> problems are intel and amd iommu, at least pci reset stuff does not make it 
> worse.
> 
> Thanks
> Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu