Re: [Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
On 27 July 2012 20:29, Igor Mitsyanko i.mitsya...@samsung.com wrote: Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before a loop starts. Anthony claims that SD_GET_CLASS should be cheap enough that we don't need to hoist it out of loops like this. Do you have profiling data or similar that caused you to write this patch? -- PMM
Re: [Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
On 07/31/2012 07:43 PM, Peter Maydell wrote: On 27 July 2012 20:29, Igor Mitsyanko i.mitsya...@samsung.com wrote: Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before a loop starts. Anthony claims that SD_GET_CLASS should be cheap enough that we don't need to hoist it out of loops like this. Do you have profiling data or similar that caused you to write this patch? -- PMM Well, I've tested it by measuring an execution time of a 4Kb write to SD card, results showed that arithmetic mean of time for one 4k write was less by ~300us in sequence with SD_GET_CLASS extracted from the loop. Although I ran this test several times, I have little faith in test methodology and results, it obviously showed significant dispersion between measured time of distinct 4K writes (200-300% if I recall correctly). I really have no objection no objection to dropping this patch.
Re: [Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
On 31 July 2012 18:33, Igor Mitsyanko i.mitsya...@samsung.com wrote: On 07/31/2012 07:43 PM, Peter Maydell wrote: Anthony claims that SD_GET_CLASS should be cheap enough that we don't need to hoist it out of loops like this. Do you have profiling data or similar that caused you to write this patch? Well, I've tested it by measuring an execution time of a 4Kb write to SD card, results showed that arithmetic mean of time for one 4k write was less by ~300us in sequence with SD_GET_CLASS extracted from the loop. How much is that as a % of the total time for the write ? -- PMM
Re: [Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
On 07/31/2012 09:47 PM, Peter Maydell wrote: On 31 July 2012 18:33, Igor Mitsyanko i.mitsya...@samsung.com wrote: On 07/31/2012 07:43 PM, Peter Maydell wrote: Anthony claims that SD_GET_CLASS should be cheap enough that we don't need to hoist it out of loops like this. Do you have profiling data or similar that caused you to write this patch? Well, I've tested it by measuring an execution time of a 4Kb write to SD card, results showed that arithmetic mean of time for one 4k write was less by ~300us in sequence with SD_GET_CLASS extracted from the loop. How much is that as a % of the total time for the write ? -- PMM total write time was ~3.5-4.0 ms I think, so its ~0.08%
Re: [Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
Igor Mitsyanko i.mitsya...@samsung.com writes: On 07/31/2012 07:43 PM, Peter Maydell wrote: On 27 July 2012 20:29, Igor Mitsyanko i.mitsya...@samsung.com wrote: Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before a loop starts. Anthony claims that SD_GET_CLASS should be cheap enough that we don't need to hoist it out of loops like this. Do you have profiling data or similar that caused you to write this patch? -- PMM Well, I've tested it by measuring an execution time of a 4Kb write to SD card, results showed that arithmetic mean of time for one 4k write was less by ~300us in sequence with SD_GET_CLASS extracted from the loop. How many loop iterations that? 300us is a huge amount of time, unless you were looping on every byte, I have a hard time understanding that delta. Regards, Anthony Liguori Although I ran this test several times, I have little faith in test methodology and results, it obviously showed significant dispersion between measured time of distinct 4K writes (200-300% if I recall correctly). I really have no objection no objection to dropping this patch.
Re: [Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
On 07/31/2012 10:15 PM, Anthony Liguori wrote: How many loop iterations that? 300us is a huge amount of time, unless you were looping on every byte, I have a hard time understanding that delta. I'm not sure I understand what you mean, I did loop on every byte, that's how our SD model works. So, for 4K bytes qemu have to call SD_GET_CLASS 4K times. Or you're asking about write sequence length? It was 10 writes of 4096 bytes (not statistically reliable, I have to agree).
[Qemu-devel] [PATCH V4 10/12] SD card users: optimize access to SDClass methods
Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before a loop starts. Signed-off-by: Igor Mitsyanko i.mitsya...@samsung.com --- hw/omap_mmc.c|9 + hw/pl181.c |7 --- hw/pxa2xx_mmci.c |6 -- 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/hw/omap_mmc.c b/hw/omap_mmc.c index aec0285..8ceb25b 100644 --- a/hw/omap_mmc.c +++ b/hw/omap_mmc.c @@ -219,6 +219,7 @@ static void omap_mmc_command(struct omap_mmc_s *host, int cmd, int dir, static void omap_mmc_transfer(struct omap_mmc_s *host) { +SDClass *sd_class = SD_GET_CLASS(host-card); uint8_t value; if (!host-transfer) @@ -229,10 +230,10 @@ static void omap_mmc_transfer(struct omap_mmc_s *host) if (host-fifo_len host-af_level) break; -value = sd_read_data(host-card); +value = sd_class-read_data(host-card); host-fifo[(host-fifo_start + host-fifo_len) 31] = value; if (-- host-blen_counter) { -value = sd_read_data(host-card); +value = sd_class-read_data(host-card); host-fifo[(host-fifo_start + host-fifo_len) 31] |= value 8; host-blen_counter --; @@ -244,10 +245,10 @@ static void omap_mmc_transfer(struct omap_mmc_s *host) break; value = host-fifo[host-fifo_start] 0xff; -sd_write_data(host-card, value); +sd_class-write_data(host-card, value); if (-- host-blen_counter) { value = host-fifo[host-fifo_start] 8; -sd_write_data(host-card, value); +sd_class-write_data(host-card, value); host-blen_counter --; } diff --git a/hw/pl181.c b/hw/pl181.c index 7d91fbb..8cd898c 100644 --- a/hw/pl181.c +++ b/hw/pl181.c @@ -209,18 +209,19 @@ error: static void pl181_fifo_run(pl181_state *s) { +SDClass *sd_class = SD_GET_CLASS(s-card); uint32_t bits; uint32_t value = 0; int n; int is_read; is_read = (s-datactrl PL181_DATA_DIRECTION) != 0; -if (s-datacnt != 0 (!is_read || sd_data_ready(s-card)) +if (s-datacnt != 0 (!is_read || sd_class-data_ready(s-card)) !s-linux_hack) { if (is_read) { n = 0; while (s-datacnt s-fifo_len PL181_FIFO_LEN) { -value |= (uint32_t)sd_read_data(s-card) (n * 8); +value |= (uint32_t)sd_class-read_data(s-card) (n * 8); s-datacnt--; n++; if (n == 4) { @@ -241,7 +242,7 @@ static void pl181_fifo_run(pl181_state *s) } n--; s-datacnt--; -sd_write_data(s-card, value 0xff); +sd_class-write_data(s-card, value 0xff); value = 8; } } diff --git a/hw/pxa2xx_mmci.c b/hw/pxa2xx_mmci.c index b505a4c..af19c36 100644 --- a/hw/pxa2xx_mmci.c +++ b/hw/pxa2xx_mmci.c @@ -117,12 +117,14 @@ static void pxa2xx_mmci_int_update(PXA2xxMMCIState *s) static void pxa2xx_mmci_fifo_update(PXA2xxMMCIState *s) { +SDClass *sd_class = SD_GET_CLASS(s-card); + if (!s-active) return; if (s-cmdat CMDAT_WR_RD) { while (s-bytesleft s-tx_len) { -sd_write_data(s-card, s-tx_fifo[s-tx_start ++]); +sd_class-write_data(s-card, s-tx_fifo[s-tx_start ++]); s-tx_start = 0x1f; s-tx_len --; s-bytesleft --; @@ -132,7 +134,7 @@ static void pxa2xx_mmci_fifo_update(PXA2xxMMCIState *s) } else while (s-bytesleft s-rx_len 32) { s-rx_fifo[(s-rx_start + (s-rx_len ++)) 0x1f] = -sd_read_data(s-card); +sd_class-read_data(s-card); s-bytesleft --; s-intreq |= INT_RXFIFO_REQ; } -- 1.7.5.4