Re: [PATCH] e1000e: using bottom half to send packets

Li Qiang Sun, 19 Jul 2020 21:42:49 -0700

Jason Wang <jasow...@redhat.com> 于2020年7月20日周一 下午12:00写道：
>
>
> On 2020/7/17 下午11:46, Li Qiang wrote:
> > Jason Wang <jasow...@redhat.com> 于2020年7月17日周五 下午1:39写道：
> >>
> >> On 2020/7/17 下午12:46, Li Qiang wrote:
> >>> Jason Wang <jasow...@redhat.com> 于2020年7月17日周五 上午11:10写道：
> >>>> On 2020/7/17 上午12:14, Li Qiang wrote:
> >>>>> Alexander Bulekov reported a UAF bug related e1000e packets send.
> >>>>>
> >>>>> -->https://bugs.launchpad.net/qemu/+bug/1886362
> >>>>>
> >>>>> This is because the guest trigger a e1000e packet send and set the
> >>>>> data's address to e1000e's MMIO address. So when the e1000e do DMA
> >>>>> it will write the MMIO again and trigger re-entrancy and finally
> >>>>> causes this UAF.
> >>>>>
> >>>>> Paolo suggested to use a bottom half whenever MMIO is doing complicate
> >>>>> things in here:
> >>>>> -->https://lists.nongnu.org/archive/html/qemu-devel/2020-07/msg03342.html
> >>>>>
> >>>>> Reference here:
> >>>>> 'The easiest solution is to delay processing of descriptors to a bottom
> >>>>> half whenever MMIO is doing something complicated.  This is also better
> >>>>> for latency because it will free the vCPU thread more quickly and leave
> >>>>> the work to the I/O thread.'
> >>>> I think several things were missed in this patch (take virtio-net as a
> >>>> reference), do we need the following things:
> >>>>
> >>> Thanks Jason,
> >>> In fact I know this, I'm scared for touching this but I want to try.
> >>> Thanks for your advice.
> >>>
> >>>> - Cancel the bh when VM is stopped.
> >>> Ok. I think add a vm state change notifier for e1000e can address this.
> >>>
> >>>> - A throttle to prevent bh from executing too much timer?
> >>> Ok, I think add a config timeout and add a timer in e1000e can address 
> >>> this.
> >>
> >> Sorry, a typo. I meant we probably need a tx_burst as what virtio-net did.
> >>
> >>
> >>>> - A flag to record whether or not this a pending tx (and migrate it?)
> >>> Is just a flag enough? Could you explain more about the idea behind
> >>> processing the virtio-net/e1000e using bh like this?
> >>
> >> Virtio-net use a tx_waiting variable to record whether or not there's a
> >> pending bh. (E.g bh is cancelled due to vmstop, we need reschedule it
> >> after vmresume). Maybe we can do something simpler by just schecule bh
> >> unconditionally during vm resuming.
> >>
> >>
> >>> For example, if the guest trigger a lot of packets send and if the bh
> >>> is scheduled in IO thread. So will we lost packets?
> >>
> >> We don't since we don't populate virtqueue which means packets are
> >> queued there.
> >>
> > This remind of me a question:
> > If we use tx_burst like in virtion-net. For detail:
> > If we sent out  'tx_burst' packets per bh. Then we set 'tx_waiting' and
> > then schedule another bh. However if between two bh schedule, the guest 
> > change
> > the e1000e register such 'r->dh' 'r->dlen'. The data is fully corrupted.
> > In fact this issue does exist in my origin patch.  That's
> > What if following happend:
> >
> > vcpu thread: guest write e1000e MMIO to trigger packets send
> > vcpu thread: schedule a bh
> > vcpu thread: return
> > IO thread: begin to run the bh and start send packets
> > vcpu thread: write register again such as  'r->dh' 'r->dlen'..
> >
> > So here the IO thread and vcpu thread will race the register?
> >
> > If I remember correctly, the virtio net has no such problem because it
> > uses ring buffer
> > and the backedn(virtio device) uses the shadow index to index the ring
> > buffer data.
> >
> > What's your idea here?
>
>
> I think we serialize them through bql? (qemu_mutex_lock_iothread())
>


Ok I will try to cook a patch tonight based our discussion.

Thanks,
Li Qiang

> Thanks
>
>
> >
> > Thanks,
> > Li Qiang
> >
> >
> >> Thanks
> >>
> >>
> >>> How we avoid this in virtio-net.
> >>>
> >>> Thanks,
> >>> Li Qiang
> >>>
> >>>
> >>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>> This patch fixes this UAF.
> >>>>>
> >>>>> Signed-off-by: Li Qiang <liq...@163.com>
> >>>>> ---
> >>>>>     hw/net/e1000e_core.c | 25 +++++++++++++++++--------
> >>>>>     hw/net/e1000e_core.h |  2 ++
> >>>>>     2 files changed, 19 insertions(+), 8 deletions(-)
> >>>>>
> >>>>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> >>>>> index bcd186cac5..6165b04b68 100644
> >>>>> --- a/hw/net/e1000e_core.c
> >>>>> +++ b/hw/net/e1000e_core.c
> >>>>> @@ -2423,32 +2423,27 @@ e1000e_set_dbal(E1000ECore *core, int index, 
> >>>>> uint32_t val)
> >>>>>     static void
> >>>>>     e1000e_set_tctl(E1000ECore *core, int index, uint32_t val)
> >>>>>     {
> >>>>> -    E1000E_TxRing txr;
> >>>>>         core->mac[index] = val;
> >>>>>
> >>>>>         if (core->mac[TARC0] & E1000_TARC_ENABLE) {
> >>>>> -        e1000e_tx_ring_init(core, &txr, 0);
> >>>>> -        e1000e_start_xmit(core, &txr);
> >>>>> +        qemu_bh_schedule(core->tx[0].tx_bh);
> >>>>>         }
> >>>>>
> >>>>>         if (core->mac[TARC1] & E1000_TARC_ENABLE) {
> >>>>> -        e1000e_tx_ring_init(core, &txr, 1);
> >>>>> -        e1000e_start_xmit(core, &txr);
> >>>>> +        qemu_bh_schedule(core->tx[1].tx_bh);
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>>     static void
> >>>>>     e1000e_set_tdt(E1000ECore *core, int index, uint32_t val)
> >>>>>     {
> >>>>> -    E1000E_TxRing txr;
> >>>>>         int qidx = e1000e_mq_queue_idx(TDT, index);
> >>>>>         uint32_t tarc_reg = (qidx == 0) ? TARC0 : TARC1;
> >>>>>
> >>>>>         core->mac[index] = val & 0xffff;
> >>>>>
> >>>>>         if (core->mac[tarc_reg] & E1000_TARC_ENABLE) {
> >>>>> -        e1000e_tx_ring_init(core, &txr, qidx);
> >>>>> -        e1000e_start_xmit(core, &txr);
> >>>>> +        qemu_bh_schedule(core->tx[qidx].tx_bh);
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>> @@ -3322,6 +3317,16 @@ e1000e_vm_state_change(void *opaque, int 
> >>>>> running, RunState state)
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>> +static void e1000e_core_tx_bh(void *opaque)
> >>>>> +{
> >>>>> +    struct e1000e_tx *tx = opaque;
> >>>>> +    E1000ECore *core = tx->core;
> >>>>> +    E1000E_TxRing txr;
> >>>>> +
> >>>>> +    e1000e_tx_ring_init(core, &txr, tx - &core->tx[0]);
> >>>>> +    e1000e_start_xmit(core, &txr);
> >>>>> +}
> >>>>> +
> >>>>>     void
> >>>>>     e1000e_core_pci_realize(E1000ECore     *core,
> >>>>>                             const uint16_t *eeprom_templ,
> >>>>> @@ -3340,6 +3345,8 @@ e1000e_core_pci_realize(E1000ECore     *core,
> >>>>>         for (i = 0; i < E1000E_NUM_QUEUES; i++) {
> >>>>>             net_tx_pkt_init(&core->tx[i].tx_pkt, core->owner,
> >>>>>                             E1000E_MAX_TX_FRAGS, core->has_vnet);
> >>>>> +        core->tx[i].core = core;
> >>>>> +        core->tx[i].tx_bh = qemu_bh_new(e1000e_core_tx_bh, 
> >>>>> &core->tx[i]);
> >>>>>         }
> >>>>>
> >>>>>         net_rx_pkt_init(&core->rx_pkt, core->has_vnet);
> >>>>> @@ -3367,6 +3374,8 @@ e1000e_core_pci_uninit(E1000ECore *core)
> >>>>>         for (i = 0; i < E1000E_NUM_QUEUES; i++) {
> >>>>>             net_tx_pkt_reset(core->tx[i].tx_pkt);
> >>>>>             net_tx_pkt_uninit(core->tx[i].tx_pkt);
> >>>>> +        qemu_bh_delete(core->tx[i].tx_bh);
> >>>>> +        core->tx[i].tx_bh = NULL;
> >>>>>         }
> >>>>>
> >>>>>         net_rx_pkt_uninit(core->rx_pkt);
> >>>>> diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
> >>>>> index aee32f7e48..94ddc6afc2 100644
> >>>>> --- a/hw/net/e1000e_core.h
> >>>>> +++ b/hw/net/e1000e_core.h
> >>>>> @@ -77,6 +77,8 @@ struct E1000Core {
> >>>>>             unsigned char sum_needed;
> >>>>>             bool cptse;
> >>>>>             struct NetTxPkt *tx_pkt;
> >>>>> +        QEMUBH *tx_bh;
> >>>>> +        E1000ECore *core;
> >>>>>         } tx[E1000E_NUM_QUEUES];
> >>>>>
> >>>>>         struct NetRxPkt *rx_pkt;
>

Re: [PATCH] e1000e: using bottom half to send packets

Reply via email to