* John Snow (js...@redhat.com) wrote:
> 
> 
> On 12/09/2014 01:15 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
> >
> >(With the previous atapi_dma flag recovery)
> >If migration happens between the ATAPI command being written and the
> >bmdma being started, the DMA is dropped.  Eventually the guest times
> >out and recovers, but that can take many seconds.
> >(This is rare, on a pingpong reading the CD continuously I hit
> >this about ~1/30-1/50 migrates)
> >
> >I don't think we've got enough state to be able to recover safely
> >at this point, so I throw a 'medium error, no seek complete'
> >that I'm assuming guests will try and recover from an apparently
> >dirty CD.
> >
> >OK, it's a hack, the real solution is probably to push a lot of
> >ATAPI state into the migration stream, but this is a fix that
> >works with no stream changes. Tested only on Linux (both RHEL5
> >(pre-libata) and RHEL7).
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
> >---
> >  hw/ide/atapi.c    | 17 +++++++++++++++++
> >  hw/ide/internal.h |  2 ++
> >  hw/ide/pci.c      | 11 +++++++++++
> >  3 files changed, 30 insertions(+)
> >
> >diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> >index c63b7e5..e17799c 100644
> >--- a/hw/ide/atapi.c
> >+++ b/hw/ide/atapi.c
> >@@ -394,6 +394,23 @@ static void ide_atapi_cmd_read(IDEState *s, int lba, 
> >int nb_sectors,
> >      }
> >  }
> >
> >+
> >+/* Called by *_restart_bh when the transfer function points
> >+ * to ide_atapi_cmd
> >+ */
> >+void ide_atapi_dma_restart(IDEState *s)
> >+{
> >+    /*
> >+     * I'm not sure we have enough stored to restart the command
> >+     * safely, so give the guest an error it should recover from.
> >+     * I'm assuming most guests will try to recover from something
> >+     * listed as a medium error on a CD; it seems to work on Linux.
> >+     * This would be more of a problem if we did any other type of
> >+     * DMA operation.
> >+     */
> >+    ide_atapi_cmd_error(s, MEDIUM_ERROR, ASC_NO_SEEK_COMPLETE);
> >+}
> >+
> 
> Is this safe for non-data commands? Can we even get there in such a case?

See below.

> >  static inline uint8_t ide_atapi_set_profile(uint8_t *buf, uint8_t *index,
> >                                              uint16_t profile)
> >  {
> >diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> >index 8a3eca4..8b65285 100644
> >--- a/hw/ide/internal.h
> >+++ b/hw/ide/internal.h
> >@@ -289,6 +289,7 @@ typedef struct IDEDMAOps IDEDMAOps;
> >  #define ATAPI_INT_REASON_TAG            0xf8
> >
> >  /* same constants as bochs */
> >+#define ASC_NO_SEEK_COMPLETE                 0x02
> >  #define ASC_ILLEGAL_OPCODE                   0x20
> >  #define ASC_LOGICAL_BLOCK_OOR                0x21
> >  #define ASC_INV_FIELD_IN_CMD_PACKET          0x24
> >@@ -529,6 +530,7 @@ void ide_dma_error(IDEState *s);
> >
> >  void ide_atapi_cmd_ok(IDEState *s);
> >  void ide_atapi_cmd_error(IDEState *s, int sense_key, int asc);
> >+void ide_atapi_dma_restart(IDEState *s);
> >  void ide_atapi_io_error(IDEState *s, int ret);
> >
> >  void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val);
> >diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> >index bee5ad3..e3f2054 100644
> >--- a/hw/ide/pci.c
> >+++ b/hw/ide/pci.c
> >@@ -235,6 +235,17 @@ static void bmdma_restart_bh(void *opaque)
> >          }
> >      } else if (error_status & IDE_RETRY_FLUSH) {
> >          ide_flush_cache(bmdma_active_if(bm));
> >+    } else {
> >+        IDEState *s = bmdma_active_if(bm);
> >+
> >+        /*
> >+         * We've not got any bits to tell us about ATAPI - but
> >+         * we do have the end_transfer_func that tells us what
> >+         * we're trying to do.
> >+         */
> >+        if (s->end_transfer_func == ide_atapi_cmd) {
> >+            ide_atapi_dma_restart(s);
> >+        }
> 
> OK, so when the restart routines get invoked we add a hook to see if we were
> in the middle of an ATAPI command and acknowledge that we don't know how to
> properly handle this.

As to your qeustion above about non-data commands; hmm probably - but how
do I guard it? I guess I could check for the atapi_dma flag the previous
patch fixed.
(This is all probably still broken for non-DMA atapi transfers)

> Isn't this going to run on every vmstate change, though?

There aren't many - only starting/stopping the CPU does it; and bmdma_restart_cb
guards it by 'if (!running)' exit, so it'll only do it when the CPU starts
running again.

> I think we don't
> clear out end_transfer_func on success, so this might fire off more than we
> want it to, although I guess end_transfer_func is usually going to get set
> to ide_atapi_cmd_reply_end if it finishes normally ...

Right, or if ide_transfer_stop is called.

> >      }
> >  }
> >
> >
> 
> Indeed a hack, but it's probably appropriate: if our code cannot in fact
> handle ATAPI migration, throwing an error or disabling migration is the
> correct thing to do, but I don't think users would be very happy with the
> second option. I feel that this is an OK workaround because it should not
> introduce spurious errors or retries for cases where we manage to avoid
> migrating in the middle of the loop. This will at least let the currently
> broken case limp along until we fix it more properly.
> 
> What makes me the most curious is how this plays out in Windows if this case
> is triggered. Throw a trace around the fake error and see if you can't
> observe it getting called during a pingpong test while Windows reads a CD.

Yeh, I'm going to figure out how to try that.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Reply via email to