from:"Maxim Levitsky"

Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()

2023-12-13 Thread Maxim Levitsky

On Wed, 2023-12-13 at 14:31 -0800, Jim Mattson wrote:
> On Wed, Dec 13, 2023 at 2:25 PM Maxim Levitsky  wrote:
> > On Tue, 2023-12-12 at 07:28 -0800, Sean Christopherson wrote:
> > > On Sun, Dec 10, 2023, Jim Mattson wrote:
> > > > On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson  
> > > > wrote:
> > > > > Doh.  We got the less obvious cases and missed the obvious one.
> > > > > 
> > > > > Ugh, and we also missed a related mess in 
> > > > > kvm_guest_apic_has_interrupt().  That
> > > > > thing should really be folded into vmx_has_nested_events().
> > > > > 
> > > > > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because 
> > > > > that
> > > > > specifically checks if L1 interrupts are blocked.
> > > > > 
> > > > > Compile tested only, and definitely needs to be chunked into multiple 
> > > > > patches,
> > > > > but I think something like this mess?
> > > > 
> > > > The proposed patch does not fix the problem. In fact, it messes things
> > > > up so much that I don't get any test results back.
> > > 
> > > Drat.
> > > 
> > > > Google has an internal K-U-T test that demonstrates the problem. I
> > > > will post it soon.
> > > 
> > > Received, I'll dig in soonish, though "soonish" might unfortunately might 
> > > mean
> > > 2024.
> > > 
> > 
> > Hi,
> > 
> > So this is what I think:
> > 
> > 
> > KVM does have kvm_guest_apic_has_interrupt() for this exact purpose,
> > to check if nested APICv has a pending interrupt before halting.
> > 
> > 
> > However the problem is bigger - with APICv we have in essence 2 pending 
> > interrupt
> > bitmaps - the PIR and the IRR, and to know if the guest has a pending 
> > interrupt
> > one has in theory to copy PIR to IRR, then see if the max is larger then 
> > the current PPR.
> > 
> > Since we don't want to write to guest memory, and the IRR here resides in 
> > the guest memory,
> > I guess we have to do a 'dry-run' version of 
> > 'vmx_complete_nested_posted_interrupt' and call
> > it from  kvm_guest_apic_has_interrupt().
> > 
> > What do you think? I can prepare a patch for this.
> > 
> > Can you share a reproducer or write a new one that can be shared?
> 
> See https://lore.kernel.org/kvm/20231211185552.3856862-1-jmatt...@google.com/.

Thank you!

Best regards,
Maxim Levitsky

> 
> > Best regards,
> > Maxim Levitsky
> >

Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()

2023-12-13 Thread Maxim Levitsky

On Tue, 2023-12-12 at 07:28 -0800, Sean Christopherson wrote:
> On Sun, Dec 10, 2023, Jim Mattson wrote:
> > On Thu, Dec 7, 2023 at 8:21 AM Sean Christopherson  
> > wrote:
> > > Doh.  We got the less obvious cases and missed the obvious one.
> > > 
> > > Ugh, and we also missed a related mess in kvm_guest_apic_has_interrupt(). 
> > >  That
> > > thing should really be folded into vmx_has_nested_events().
> > > 
> > > Good gravy.  And vmx_interrupt_blocked() does the wrong thing because that
> > > specifically checks if L1 interrupts are blocked.
> > > 
> > > Compile tested only, and definitely needs to be chunked into multiple 
> > > patches,
> > > but I think something like this mess?
> > 
> > The proposed patch does not fix the problem. In fact, it messes things
> > up so much that I don't get any test results back.
> 
> Drat.
> 
> > Google has an internal K-U-T test that demonstrates the problem. I
> > will post it soon.
> 
> Received, I'll dig in soonish, though "soonish" might unfortunately might mean
> 2024.
> 

Hi,

So this is what I think:

KVM does have kvm_guest_apic_has_interrupt() for this exact purpose,
to check if nested APICv has a pending interrupt before halting.

However the problem is bigger - with APICv we have in essence 2 pending 
interrupt
bitmaps - the PIR and the IRR, and to know if the guest has a pending interrupt
one has in theory to copy PIR to IRR, then see if the max is larger then the 
current PPR.

Since we don't want to write to guest memory, and the IRR here resides in the 
guest memory,
I guess we have to do a 'dry-run' version of 
'vmx_complete_nested_posted_interrupt' and call
it from  kvm_guest_apic_has_interrupt().

What do you think? I can prepare a patch for this.

Can you share a reproducer or write a new one that can be shared?

Best regards,
Maxim Levitsky

Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused

2022-04-19 Thread Maxim Levitsky

On Tue, 2022-04-19 at 15:45 +, Sean Christopherson wrote:
> On Tue, Apr 19, 2022, Maxim Levitsky wrote:
> > On Fri, 2022-04-15 at 00:43 +, Sean Christopherson wrote:
> > > Add wrappers to acquire/release KVM's SRCU lock when stashing the index
> > > in vcpu->src_idx, along with rudimentary detection of illegal usage,
> > > e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx.  Because the
> > > SRCU index is (currently) either 0 or 1, illegal nesting bugs can go
> > > unnoticed for quite some time and only cause problems when the nested
> > > lock happens to get a different index.
> > > 
> > > Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will
> > > likely yell so loudly that it will bring the kernel to its knees.
> > > 
> > > Signed-off-by: Sean Christopherson 
> > > ---
> 
> ...
> 
> > Looks good to me overall.
> > 
> > Note that there are still places that acquire the lock and store the idx 
> > into
> > a local variable, for example kvm_xen_vcpu_set_attr and such.
> > I didn't check yet if these should be converted as well.
> 
> Using a local variable is ok, even desirable.  Nested/multiple readers is not 
> an
> issue, the bug fixed by patch 1 is purely that kvm_vcpu.srcu_idx gets 
> corrupted.

Makes sense. I still recal *that* bug in AVIC inhibition where that srcu lock 
was
a major PITA, but now I remember that it was not due to nesting of the lock,
but rather fact that we attempted to call syncronize_srcu or something like that
with it held.


> 
> In an ideal world, KVM would _only_ track the SRCU index in local variables, 
> but
> that would require plumbing the local variable down into vcpu_enter_guest() 
> and
> kvm_vcpu_block() so that SRCU can be unlocked prior to entering the guest or
> scheduling out the vCPU.
> 
It all makes sense now - thanks.

Best regards,
Maxim Levistky

Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused

2022-04-19 Thread Maxim Levitsky

smp_store_release(&vcpu->mode, IN_GUEST_MODE);
>  
> - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>  
>   /*
>* 1) We should set ->mode before checking ->requests.  Please see
> @@ -10188,7 +10188,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   smp_wmb();
>   local_irq_enable();
>   preempt_enable();
> - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   r = 1;
>   goto cancel_injection;
>   }
> @@ -10314,7 +10314,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   local_irq_enable();
>   preempt_enable();
>  
> - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>  
>   /*
>* Profile KVM exit RIPs:
> @@ -10344,7 +10344,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  }
>  
>  /* Called within kvm->srcu read side.  */
> -static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> +static inline int vcpu_block(struct kvm_vcpu *vcpu)
>  {
>   bool hv_timer;
>  
> @@ -10360,12 +10360,12 @@ static inline int vcpu_block(struct kvm *kvm, 
> struct kvm_vcpu *vcpu)
>   if (hv_timer)
>   kvm_lapic_switch_to_sw_timer(vcpu);
>  
> - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
>   kvm_vcpu_halt(vcpu);
>   else
>   kvm_vcpu_block(vcpu);
> - vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>  
>   if (hv_timer)
>   kvm_lapic_switch_to_hv_timer(vcpu);
> @@ -10407,7 +10407,6 @@ static inline bool kvm_vcpu_running(struct kvm_vcpu 
> *vcpu)
>  static int vcpu_run(struct kvm_vcpu *vcpu)
>  {
>   int r;
> - struct kvm *kvm = vcpu->kvm;
>  
>   vcpu->arch.l1tf_flush_l1d = true;
>  
> @@ -10415,7 +10414,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>   if (kvm_vcpu_running(vcpu)) {
>   r = vcpu_enter_guest(vcpu);
>   } else {
> - r = vcpu_block(kvm, vcpu);
> + r = vcpu_block(vcpu);
>   }
>  
>   if (r <= 0)
> @@ -10437,9 +10436,9 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>   }
>  
>   if (__xfer_to_guest_mode_work_pending()) {
> - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   r = xfer_to_guest_mode_handle_work(vcpu);
> - vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   if (r)
>   return r;
>   }
> @@ -10542,7 +10541,6 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_run *kvm_run = vcpu->run;
> - struct kvm *kvm = vcpu->kvm;
>   int r;
>  
>   vcpu_load(vcpu);
> @@ -10550,7 +10548,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>   kvm_run->flags = 0;
>   kvm_load_guest_fpu(vcpu);
>  
> - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>   if (kvm_run->immediate_exit) {
>   r = -EINTR;
> @@ -10562,9 +10560,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>*/
>   WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
>  
> - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   kvm_vcpu_block(vcpu);
> - vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>  
>   if (kvm_apic_accept_events(vcpu) < 0) {
>   r = 0;
> @@ -10625,7 +10623,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>   if (kvm_run->kvm_valid_regs)
>   store_regs(vcpu);
>   post_kvm_run_save(vcpu);
> - srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>  
>   kvm_sigset_deactivate(vcpu);
>   vcpu_put(vcpu);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 252ee4a61b58..76fc7233bd6a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -315,7 +315,10 @@ struct kvm_vcpu {
>   int cpu;
>   int vcpu_id; /* id given by userspace at creation */
>   int vcpu_idx; /* index in kvm->vcpus array */
> - int srcu_idx;
> + int srcu_idx; /* Don't use this directly.  You've been warned. */
> +#ifdef CONFIG_PROVE_RCU
> + int srcu_depth;
> +#endif
>   int mode;
>   u64 requests;
>   unsigned long guest_debug;
> @@ -841,6 +844,25 @@ static inline void kvm_vm_bugged(struct kvm *kvm)
>   unlikely(__ret);\
>  })
>  
> +static inline void kvm_vcpu_srcu_read_lock(struct kvm_vcpu *vcpu)
> +{
> +#ifdef CONFIG_PROVE_RCU
> + WARN_ONCE(vcpu->srcu_depth++,
> +   "KVM: Illegal vCPU srcu_idx LOCK, depth=%d", vcpu->srcu_depth 
> - 1);
> +#endif
> + vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> +}
> +
> +static inline void kvm_vcpu_srcu_read_unlock(struct kvm_vcpu *vcpu)
> +{
> + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> +
> +#ifdef CONFIG_PROVE_RCU
> + WARN_ONCE(--vcpu->srcu_depth,
> +   "KVM: Illegal vCPU srcu_idx UNLOCK, depth=%d", 
> vcpu->srcu_depth);
> +#endif
> +}
> +
>  static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
>  {
>   return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);


Looks good to me overall.

Note that there are still places that acquire the lock and store the idx into
a local variable, for example kvm_xen_vcpu_set_attr and such.
I didn't check yet if these should be converted as well.

Best regards,
Maxim Levitsky

Re: [PATCH 1/3] KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()

2022-04-19 Thread Maxim Levitsky

On Fri, 2022-04-15 at 00:43 +, Sean Christopherson wrote:
> Don't re-acquire SRCU in complete_emulated_io() now that KVM acquires the
> lock in kvm_arch_vcpu_ioctl_run().  More importantly, don't overwrite
> vcpu->srcu_idx.  If the index acquired by complete_emulated_io() differs
> from the one acquired by kvm_arch_vcpu_ioctl_run(), KVM will effectively
> leak a lock and hang if/when synchronize_srcu() is invoked for the
> relevant grace period.
> 
> Fixes: 8d25b7beca7e ("KVM: x86: pull kvm->srcu read-side to 
> kvm_arch_vcpu_ioctl_run")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/x86.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ab336f7c82e4..f35fe09de59d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10450,12 +10450,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  
>  static inline int complete_emulated_io(struct kvm_vcpu *vcpu)
>  {
> - int r;
> -
> - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> - r = kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
> - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> - return r;
> + return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
>  }
>  
>  static int complete_emulated_pio(struct kvm_vcpu *vcpu)

I wonder how this did work

Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky

Re: [PATCH] synchronize_irq needs a barrier

2007-10-19 Thread Maxim Levitsky

On Saturday 20 October 2007 07:46:24 Benjamin Herrenschmidt wrote:
> 
> > I probably need to add this synchronize_irq() logic in dmfe.c too, but I 
> > probably do it later,
> > I think I am overestimating this race, since most drivers don't do 
> > dev->insuspend checks in IRQ handler.
> > Maybe even just use free_irq() after all
> 
> Most drivers are probably underestimating the race :-)
> 
> free_irq() would work provided that you did the masking on chip before
> (and unmask only after request_irq on the way back in). But it's a bit
> like using a 10 tons truck to crush an ant... 
Agreed.

So, I will add synchronize_irq() to both saa7134, and dmfe, the two drivers 
that their .suspend/.resume 
routines were written by me.

I already added a synchronize_irq()  plus few more fixes to the driver , but 
those patches are still in v4l tree.

I now has this:

saa_writel(SAA7134_IRQ1, 0);
saa_writel(SAA7134_IRQ2, 0);
saa_writel(SAA7134_MAIN_CTRL, 0);

synchronize_irq(pci_dev->irq);
dev->insuspend = 1;

and I will probably need (with the synchronize_irq patch applied)


/* Disable interrupts, DMA, and rest of the chip*/
saa_writel(SAA7134_IRQ1, 0);
saa_writel(SAA7134_IRQ2, 0);
saa_writel(SAA7134_MAIN_CTRL, 0);

dev->insuspend = 1;
synchronize_irq(pci_dev->irq);

/* ACK pending interrupts just in case*/
saa_writel(SAA7134_IRQ_REPORT,saa_readl(SAA7134_IRQ_REPORT));

......
This should be bullet-proof.


> 
> Ben.
> 
> 
> 
Best regards,
Maxim Levitsky
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] synchronize_irq needs a barrier

2007-10-19 Thread Maxim Levitsky

On Saturday 20 October 2007 07:04:35 Benjamin Herrenschmidt wrote:
> 
> > 1) some drivers use pci_disable_device(), and pci_enable_device().
> > should I use it too?
> 
> I generally don't do the former, and I would expect the late to be done
> by pci_restore_state() for you. pci_disable_device(), last I looked,
> only cleared the bus master bit though, which might be a good idea to
> do.
> 
> People in ACPI/x86 land, are there more good reasons to do one or the
> other ?
> 
> That reminds me that I volunteered to write a documentation on how
> drivers should do all that stuff at KS and didn't get to actually doing
> it yet. shame ... I'll try to start something asap.
> 
> > 2) I accidentally did this:
> > pci_set_power_state(pci_dev, pci_choose_state(pci_dev, state));
> > pci_save_state(pci_dev);
> > 
> > I somehow thought that this is correct, that I should save the pci config 
> > state
> > after the power-down, but now I know that it isn't correct.
> 
> Right, you need to do the save_state before the power down. You need to
> avoid pretty much any access to the device after the save state other
> than the pending set_power_state on resume.
> 
> > I now need to send a patch for dmfe.c network driver that has the same 
> > commands written by me.
> > (but it works perfectly anyway)
> 
> On x86 desktop... might have surprises on others.
> 
> > Is it possible to access pci configuration space in D3?
> 
> It's only really safe to access the PM register itself, though I suppose
> you should be able to walk the capability chain to do that. But I
> wouldnt recommend doing anything else.
> 
> > And lastly speaking of network drivers, one issue came to my mind:
> > most network drivars has a packet queue and in case of dmfe it is located 
> > in main memory,
> > and card does dma from it.
> 
> Note that the network stack nowadays does a fair bit of cleaning up for
> you before your suspend routine is called
> > 
> > in .suspend I ignore that some packets may be in that queue, and I want
> > to ask, whenever there are better ways to do that.
> > 
> > 
> > this is my dmfe .suspend routine.
> > 
> > /* Disable upper layer interface */
> > netif_device_detach(dev);
> 

> 
> Looks allright on a quick glance appart from the bits we already
> discussed.


> 
> > I guess, everybody makes mistakes... :-)
> > 
> > Other network drivers has a bit more complicated .suspend/.resume routines, 
> > but I didn't see a driver waiting for output queue to finish
> 
> I think the network stack does that nowadays but we'll have to double
> check, that's based on what DaveM told me at KS.
> 
> Ben. 
> 
> 

Hi,

Thanks a lot.
I fix the order of calls in dmfe.c
and in saa7134-core.c.

I probably need to add this synchronize_irq() logic in dmfe.c too, but I 
probably do it later,
I think I am overestimating this race, since most drivers don't do 
dev->insuspend checks in IRQ handler.
Maybe even just use free_irq() after all


Best regards,
Maxim Levitsky
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] synchronize_irq needs a barrier

2007-10-19 Thread Maxim Levitsky

On Saturday 20 October 2007 05:56:01 Benjamin Herrenschmidt wrote:
> 
> > I have read this thread and I concluded few things:
> > 
> > 1) It is impossible to know that the card won't send more interrupts:
> > Even if I do a read from the device, the IRQ can be pending in the bus/APIC
> > It is even possible (and likely) that the IRQ line will be shared, thus the 
> > handler can be called by non-relevant device.
> > 
> > 2) the synchronize_irq(); in .suspend is useless:
> > an IRQ can happen immediately after this synchronize_irq();
> > and interrupt even the .suspend()
> > (At least theoretically)
> 
> It's not totally useless not. It guarantees that by the time your
> are out of your suspend(), a simultaneous instance of the IRQ handler
> is either finished, or it (or any subsequent occurence) have seen
> your insuspend flag set to 1.
> 
> > Thus I now understand that .suspend() should do:
> > 
> > saa_writel(SAA7134_IRQ1, 0);
> > saa_writel(SAA7134_IRQ2, 0);
> > saa_writel(SAA7134_MAIN_CTRL, 0);
> > 
> > dev->insuspend = 1;
> > smp_wmb();
> > 
> > /* at that point the _request to disable card's IRQs was issued, we 
> > don't know 
> >that there will be no irqs anymore.
> >the smp_mb(); guaranties that the IRQ handler will bail out in that 
> > case. */
> > 
> > /* ...*/
> > 
> > pci_save_state(pci_dev);
> > pci_set_power_state(pci_dev, pci_choose_state(pci_dev, state));
> > return 0;
> 
> The above doesn't handle the case where the IRQ handle just passed the
> if (insuspend) test. You may end up calling pci_set_power_state() while
> in the middle of some further HW accesses in the handler.
> 
> You still need synchronize_irq() for that reason. Or use a spinlock.
> 
> > and the interrupt handler:
> > 
> > smp_rmb();
> > if (dev->insuspend)
> > goto out;
> > 
> > Am I right?
> 
> Not quite :-)
> 
> Also not that the work we are doing with synchronize_irq, if it gets
> merged, renders it unnecessary for you to add those two memory barriers,
> synchronize_irq will pretty much do the ordering for you.
> 
> Cheers,
> Ben.
> 
> 


Fine, I Agree, and this is why I used it in first place, I just forgot.
To wait for already running IRQ handler, that tested the dev->insuspend.
(And I assumed that interrupts are disabled on the cpu that runs .suspend, but 
I know now
that this isn't true.)

Besides that can I ask few more .suspend/resume questions:

1) some drivers use pci_disable_device(), and pci_enable_device().
should I use it too?

2) I accidentally did this:
pci_set_power_state(pci_dev, pci_choose_state(pci_dev, state));
pci_save_state(pci_dev);

I somehow thought that this is correct, that I should save the pci config state
after the power-down, but now I know that it isn't correct.
I now need to send a patch for dmfe.c network driver that has the same commands 
written by me.
(but it works perfectly anyway)
Is it possible to access pci configuration space in D3?

And lastly speaking of network drivers, one issue came to my mind:
most network drivars has a packet queue and in case of dmfe it is located in 
main memory,
and card does dma from it.


in .suspend I ignore that some packets may be in that queue, and I want
to ask, whenever there are better ways to do that.


this is my dmfe .suspend routine.

/* Disable upper layer interface */
netif_device_detach(dev);

/* Disable Tx/Rx */
db->cr6_data &= ~(CR6_RXSC | CR6_TXSC);
update_cr6(db->cr6_data, dev->base_addr);

/* Disable Interrupt */
outl(0, dev->base_addr + DCR7);
outl(inl (dev->base_addr + DCR5), dev->base_addr + DCR5);

/* Fre RX buffers */
dmfe_free_rxbuffer(db);

/* Enable WOL */
pci_read_config_dword(pci_dev, 0x40, &tmp);
tmp &= ~(DMFE_WOL_LINKCHANGE|DMFE_WOL_MAGICPACKET);

if (db->wol_mode & WAKE_PHY)
tmp |= DMFE_WOL_LINKCHANGE;
if (db->wol_mode & WAKE_MAGIC)
tmp |= DMFE_WOL_MAGICPACKET;

pci_write_config_dword(pci_dev, 0x40, tmp);

pci_enable_wake(pci_dev, PCI_D3hot, 1);
    pci_enable_wake(pci_dev, PCI_D3cold, 1);

/* Power down device*/
pci_set_power_state(pci_dev, pci_choose_state (pci_dev,state));
pci_save_state(pci_dev);


I guess, everybody makes mistakes... :-)

Other network drivers has a bit more complicated .suspend/.resume routines, 
but I didn't see a driver waiting for output queue to finish

Best regards,
Maxim Levitsky


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] synchronize_irq needs a barrier

2007-10-19 Thread Maxim Levitsky

On Saturday 20 October 2007 04:25:34 Linus Torvalds wrote:
> 
> On Sat, 20 Oct 2007, Maxim Levitsky wrote:
> > 
> > and the interrupt handler:
> > 
> > smp_rmb();
> > if (dev->insuspend)
> > goto out;
> 
> Something like that can work, yes. However, you need to make sure that:
> 
>  - even when you ignore the interrupt (because the driver doesn't care, 
>it's suspending), you need to make sure the hardware gets shut up by 
>reading (or writing) the proper interrupt status register.
I agree, but while device is powered off, its registers can't be accessed
Thus, if I ack the IRQ every time the handler is called, I will access the 
powered off device (this is probably won't hurt a lot, but a bit incorrectly)

> 
>Otherwise, with a level interrupt, the interrupt will continue to be 
>held active ("screaming") and the CPU will get interrupted over and 
>over again, until the irq subsystem will just turn the irq off 
>entirely.
> 
>  - when you resume, make sure that you get the engine going again, with 
>the understanding that some interrupts may have gotten ignored.
Here it isn't a problem, this is a video capture card, and I suspend it by just 
stopping dma
on all active buffers even if in the middle of capture, and I send those 
buffers to card again
to fill them from the beginning during the resume.
> 
> Also, in general, these kinds of things shouldn't always even be 
> neicessary. If you use the suspend_late()/resume_early() hooks, those will 
> be called with interrupts off, and while they can be harder to debug (and 
> may be worth avoiding for non-critical drivers), they do allow for simpler 
> models partly exactly because they don't need to worry about interrupts 
> etc.
Exactly, I am aware of suspend_late , but I don't want to abuse it.
> 
>   Linus
> 


Best regards,
Maxim Levitsky

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] synchronize_irq needs a barrier

2007-10-19 Thread Maxim Levitsky

On Thursday 18 October 2007 03:25:42 Benjamin Herrenschmidt wrote:
> synchronize_irq needs at the very least a compiler barrier and a
> read barrier on SMP, but there are enough cases around where a
> write barrier is also needed and it's not a hot path so I prefer
> using a full smp_mb() here.
> 
> It will degrade to a compiler barrier on !SMP.
> 
> Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> ---
> 
> Index: linux-work/kernel/irq/manage.c
> ===
> --- linux-work.orig/kernel/irq/manage.c   2007-10-18 11:22:16.0 
> +1000
> +++ linux-work/kernel/irq/manage.c2007-10-18 11:22:20.0 +1000
> @@ -33,6 +33,7 @@ void synchronize_irq(unsigned int irq)
>   if (irq >= NR_IRQS)
>   return;
>  
> + smp_mb();
>   while (desc->status & IRQ_INPROGRESS)
>   cpu_relax();
>  }
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Hi,

I have read this thread and I concluded few things:

1) It is impossible to know that the card won't send more interrupts:
Even if I do a read from the device, the IRQ can be pending in the bus/APIC
It is even possible (and likely) that the IRQ line will be shared, thus the 
handler can be called by non-relevant device.

2) the synchronize_irq(); in .suspend is useless:
an IRQ can happen immediately after this synchronize_irq();
and interrupt even the .suspend()
(At least theoretically)

Thus I now understand that .suspend() should do:

saa_writel(SAA7134_IRQ1, 0);
saa_writel(SAA7134_IRQ2, 0);
saa_writel(SAA7134_MAIN_CTRL, 0);

dev->insuspend = 1;
smp_wmb();

/* at that point the _request to disable card's IRQs was issued, we 
don't know 
   that there will be no irqs anymore.
   the smp_mb(); guaranties that the IRQ handler will bail out in that 
case. */

/* ...*/

pci_save_state(pci_dev);
pci_set_power_state(pci_dev, pci_choose_state(pci_dev, state));
return 0;

and the interrupt handler:

smp_rmb();
if (dev->insuspend)
goto out;

Am I right?
Best regards,
Maxim Levitsky
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()

Re: [PATCH v4 10/12] KVM: x86: never write to memory from kvm_vcpu_check_block()

Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused

Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused

Re: [PATCH 1/3] KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()

Re: [PATCH] synchronize_irq needs a barrier

Re: [PATCH] synchronize_irq needs a barrier

Re: [PATCH] synchronize_irq needs a barrier

Re: [PATCH] synchronize_irq needs a barrier

Re: [PATCH] synchronize_irq needs a barrier

10 matches

Site Navigation

Mail list logo

Footer information