On 06/21/2016 12:25 AM, Dmitry Monakhov wrote:
Maxim Patlasov <mpatla...@virtuozzo.com> writes:

Dima,

I agree with general approach of this patch, but there are some
(easy-to-fix) issues. See, please, inline comments below...

On 06/20/2016 11:58 AM, Dmitry Monakhov wrote:
barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
               ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
                ->delta->allocate
                   ->io->submit_allloc: dio_submit_alloc
                     ->dio_submit_pad
E_DATA_WBI : data written, time to update index
                ->delta->allocate_complete:ploop_index_update
                  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
                  ->write_page
                  ->ploop_map_wb_complete
                    ->ploop_wb_complete_post_process
                      ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

                 ->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FUA
Sorry, I can't agree, it actually does not ignore:
I've misstyped. I ment to say REQ_FLUSH.
static void
kaio_write_page(struct ploop_io * io, struct ploop_request * preq,
          struct page * page, sector_t sec, int fua)
{
     /* No FUA in kaio, convert it to fsync */
     if (fua)
         set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state);

BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios 
via REQ_FUA
         not just latest one.
No need to tag *all*. See inline comments below.

This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH, currecntly it supported only by io_direct
- fix up fua handling for dio_submit

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>
---
   drivers/block/ploop/dev.c       |  8 +++++---
   drivers/block/ploop/io_direct.c | 29 +++++++++-----------------
   drivers/block/ploop/io_kaio.c   | 17 ++++++++++------
   drivers/block/ploop/map.c       | 45 
++++++++++++++++++++++-------------------
   include/linux/ploop/ploop.h     |  8 ++++----
   5 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * 
preq)
__TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) {
+               WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+       }
        while (preq->bl.head) {
                struct bio * bio = preq->bl.head;
                preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
                top_delta = ploop_top_delta(plo);
                sbl.head = sbl.tail = preq->aux_bio;
- /* Relocated data write required sync before BAT updatee */
-               set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+               /* Relocated data write required sync before BAT updatee
+                * this will happen inside index_update */
                if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
                        preq->eng_state = PLOOP_E_DATA_WBI;
                        plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index a6d83fe..d7ecd4a 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
        trace_submit(preq);
preflush = !!(rw & REQ_FLUSH);
-
-       if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
-               preflush = 1;
-
-       if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
-               postfua = 1;
-
-       if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
-
+       postfua = !!(rw & REQ_FUA);
+       if (ploop_req_delay_fua_possible(rw, preq)) {
                /* Mark req that delayed flush required */
-               set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
-       } else if (rw & REQ_FUA) {
-               postfua = 1;
+               set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
+               postfua = 0;
        }
"postfua" is a horrible name, let us see if we can get rid of it
completely. Also, the way how ploop_req_delay_fua_possible implemented
is prone to errors (see below an issue in kaio_complete_io_state). Let's
rework it like this:
Let it be postflush.
static inline bool ploop_req_delay_fua_possible(struct ploop_request
*preq)
{
     return preq->eng_state == PLOOP_E_DATA_WBI;
}
Then, that chunk in the dio_submit above might look as:

     /* If we can delay, mark req that delayed flush required */
     if ((rw & REQ_FUA) && ploop_req_delay_fua_possible(preq))
         set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);


-
        rw &= ~(REQ_FLUSH | REQ_FUA);
@@ -238,14 +229,15 @@ flush_bio:
                        rw2 |= REQ_FLUSH;
                        preflush = 0;
                }
-               if (unlikely(postfua && !bl.head))
-                       rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
+               /* Very unlikely, but correct.
+                * TODO: Optimize postfua via DELAY_FLUSH for any req state */
+               if (unlikely(!postfua))
+                       rw2 |= REQ_FUA;
If test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state), do nothing here --
that's obvious. So let's assume !test_bit(PLOOP_REQ_DELAYED_FLUSH,
&preq->state)...

There is a call to bio_add_page in dio_submit before that loop. The
moment bio_add_page returned zero (i.e. OK), we know for sure which bio
it came from: it is bw.cur. Then it's enough to mark new bio like this:
OK
     bio->bi_rw |= bw.cur & REQ_FUA;
Then, instead of "submit_bio(rw2, b);", use "submit_bio(rw2 | b->bi_rw,
b);".

This way we'll mark with FUA only those of new bio-s that has at least
some data (pages) from original bio-s with FUA. Makes sense?

Btw, dio_io_page() must be fixed similarly (get rid of postfua, mark
with FUA only what we really need).

ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
                submit_bio(rw2, b);
                bio_num++;
        }
-
I liked this empty line, please keep it! ;)

        ploop_complete_io_request(preq);
        return;
@@ -1520,15 +1512,14 @@ dio_read_page(struct ploop_io * io, struct ploop_request * preq, static void
   dio_write_page(struct ploop_io * io, struct ploop_request * preq,
-              struct page * page, sector_t sec, int fua)
+              struct page * page, sector_t sec, unsigned long rw)
   {
        if (!(io->files.file->f_mode & FMODE_WRITE)) {
                PLOOP_FAIL_REQUEST(preq, -EBADF);
                return;
        }
- dio_io_page(io, WRITE | (fua ? REQ_FUA : 0) | REQ_SYNC,
-                   preq, page, sec);
+       dio_io_page(io, rw | WRITE | REQ_SYNC, preq, page, sec);
   }
static int
diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
index de26319..0b93e13 100644
--- a/drivers/block/ploop/io_kaio.c
+++ b/drivers/block/ploop/io_kaio.c
@@ -72,13 +72,17 @@ static void kaio_complete_io_state(struct ploop_request * 
preq)
        }
/* Convert requested fua to fsync */
-       if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state) ||
+       if (test_and_clear_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state) ||
io_kaio has nothing to do with DELAYED_FLUSH (only dio sets it). Hence,
no need to check it here.
Why not. delayed flush is possible for !reloc writes. In such cases
we may postpone data flush till index_update. So from kaio point of view
it will looks like follows: W1,WBI,FSYNC. AFAIU this is legal because
even if power failure happens we may have WBI, but not W1. In such case
index will point to zeroed data

OK, then we must have corresponding code in kaio_submit setting DELAYED_FLUSH.

Max
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to