Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3
Dima, I'm uneasy that we still have handling RELOC_A|S broken. It seems we have full agreement that for such requests we can do unconditional FLUSH|FUA when we call write_page from ploop_index_update() and map_wb_complete(). And your idea to implement it by passing FLUSH|FUA for io_direct and post_fsync=1 for io_kaio is smart and OK. Will you send patch for that (fix barriers for RELOC_A|S requests)? Thanks, Maxim On 06/21/2016 04:56 PM, Maxim Patlasov wrote: Dima, After more thinking I realized that the whole idea of PLOOP_REQ_DELAYED_FLUSH might be bogus: it is possible that we simply do not have many enough incoming FUA-s to make delaying lucrative. This patch actually mixes three things: 1) fix barriers for RELOC_A|S requests, 2) fix barriers for ordinary requests, 3) DELAYED_FLUSH optimization. So, please, split the patch into three and make some measurements demonstrating that applying "DELAYED_FLUSH optimization" patch on top of previous patches improves performance. I have an idea about how to fix barriers for ordinary requests -- see please the patch I'll send soon. The key point is that handling FLUSH-es is broken the same way as FUA: if you observe (rw & REQ_FLUSH) and sends first bio marked as REQ_FLUSH, it guarantees nothing unless you wait for completion before submitting further bio-s! And ploop simply does not have the logic of waiting the first before sending others. And, to make things worse, not only dio_submit is affected, dio_sibmit_pad and dio_io_page to be fixed too. There are also some inline comments below... On 06/21/2016 06:55 AM, Dmitry Monakhov wrote: barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH - fix fua handling for dio_submit - BUG_ON for REQ_FLUSH in kaio_page_write This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 30 ++- drivers/block/ploop/io_kaio.c | 23 + drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 19 + 5 files changed, 60 insertions(+), 65 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); +if (!preq->error) { +WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); +} while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; -/* Relocated data write required sync before BAT updatee */ -set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - +/* Relocated data write required sync before BAT updatee + * this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..303eb70 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, int err; struct bio_list_walk bw; int preflush; -int postfua = 0; +int fua = 0; int write = !!(rw & REQ_WRITE); int bio_num; Your patch obsolet
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3
Dima, After more thinking I realized that the whole idea of PLOOP_REQ_DELAYED_FLUSH might be bogus: it is possible that we simply do not have many enough incoming FUA-s to make delaying lucrative. This patch actually mixes three things: 1) fix barriers for RELOC_A|S requests, 2) fix barriers for ordinary requests, 3) DELAYED_FLUSH optimization. So, please, split the patch into three and make some measurements demonstrating that applying "DELAYED_FLUSH optimization" patch on top of previous patches improves performance. I have an idea about how to fix barriers for ordinary requests -- see please the patch I'll send soon. The key point is that handling FLUSH-es is broken the same way as FUA: if you observe (rw & REQ_FLUSH) and sends first bio marked as REQ_FLUSH, it guarantees nothing unless you wait for completion before submitting further bio-s! And ploop simply does not have the logic of waiting the first before sending others. And, to make things worse, not only dio_submit is affected, dio_sibmit_pad and dio_io_page to be fixed too. There are also some inline comments below... On 06/21/2016 06:55 AM, Dmitry Monakhov wrote: barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH - fix fua handling for dio_submit - BUG_ON for REQ_FLUSH in kaio_page_write This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 30 ++- drivers/block/ploop/io_kaio.c | 23 + drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 19 + 5 files changed, 60 insertions(+), 65 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); + } while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..303eb70 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, int err; struct bio_list_walk bw; int preflush; - int postfua = 0; + int fua = 0; int write = !!(rw & REQ_WRITE); int bio_num; Your patch obsoletes bio_num. Please remove it. trace_submit(preq); preflush = !!(rw & REQ_FLUSH); - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state)) - preflush = 1; - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state)) - postfua = 1; - - if (!postfua && ploop_req_delay_fua_possible(rw, preq)) { - + fua =
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2
On 06/21/2016 12:25 AM, Dmitry Monakhov wrote: Maxim Patlasov writes: Dima, I agree with general approach of this patch, but there are some (easy-to-fix) issues. See, please, inline comments below... On 06/20/2016 11:58 AM, Dmitry Monakhov wrote: barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FUA Sorry, I can't agree, it actually does not ignore: I've misstyped. I ment to say REQ_FLUSH. static void kaio_write_page(struct ploop_io * io, struct ploop_request * preq, struct page * page, sector_t sec, int fua) { /* No FUA in kaio, convert it to fsync */ if (fua) set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state); BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. No need to tag *all*. See inline comments below. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH, currecntly it supported only by io_direct - fix up fua handling for dio_submit This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 29 +- drivers/block/ploop/io_kaio.c | 17 ++-- drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 8 5 files changed, 54 insertions(+), 53 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); + } while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..d7ecd4a 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, trace_submit(preq); preflush = !!(rw & REQ_FLUSH); - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state)) - preflush = 1; - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state)) - postfua = 1; - - if (!postfua && ploop_req_delay_fua_possible(rw, preq)) { - + postfua = !!(rw & REQ_FUA); + if (ploop_req_delay_fua_possible(rw, preq)) { /* Mark req that delayed flush required */ - set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state); - } else if (rw & REQ_FUA) { - postfua = 1; + set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state); + postfua = 0; } "postfua" is a horrible name, let us see if we can get rid of it completely. Also, the way how ploop_req_delay_fua_possible implemented is prone to errors (see below an issue in kaio_complete_io_state). Let's rework it like this: Let it be postflush. static inline bool ploop_req_delay_fua_possible(struct ploop_request *preq) { return preq->eng_state ==
[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3
barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH - fix fua handling for dio_submit - BUG_ON for REQ_FLUSH in kaio_page_write This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 30 ++- drivers/block/ploop/io_kaio.c | 23 + drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 19 + 5 files changed, 60 insertions(+), 65 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); + } while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..303eb70 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, int err; struct bio_list_walk bw; int preflush; - int postfua = 0; + int fua = 0; int write = !!(rw & REQ_WRITE); int bio_num; trace_submit(preq); preflush = !!(rw & REQ_FLUSH); - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state)) - preflush = 1; - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state)) - postfua = 1; - - if (!postfua && ploop_req_delay_fua_possible(rw, preq)) { - + fua = !!(rw & REQ_FUA); + if (fua && ploop_req_delay_fua_possible(rw, preq)) { /* Mark req that delayed flush required */ - set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state); - } else if (rw & REQ_FUA) { - postfua = 1; + set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state); + fua = 0; } - rw &= ~(REQ_FLUSH | REQ_FUA); @@ -238,8 +229,10 @@ flush_bio: rw2 |= REQ_FLUSH; preflush = 0; } - if (unlikely(postfua && !bl.head)) - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0)); + /* Very unlikely, but correct. +* TODO: Optimize postfua via DELAY_FLUSH for any req state */ + if (unlikely(fua)) + rw2 |= REQ_FUA; ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw); submit_bio(rw2, b); @@ -1520,15 +1513,14 @@ dio_read_page(struct ploop_io * io, struct ploop_request * preq, static void dio_write_page(struct ploop_io * io, struct ploop_request * preq, - struct page * page, sector_t sec, int fua) + st
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2
Maxim Patlasov writes: > Dima, > > I agree with general approach of this patch, but there are some > (easy-to-fix) issues. See, please, inline comments below... > > On 06/20/2016 11:58 AM, Dmitry Monakhov wrote: >> barrier code is broken in many ways: >> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. >> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and >> write_page (for indexes) >> So in case of grow_dev we have following sequance: >> >> E_RELOC_DATA_READ: >> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >>->delta->allocate >> ->io->submit_allloc: dio_submit_alloc >> ->dio_submit_pad >> E_DATA_WBI : data written, time to update index >>->delta->allocate_complete:ploop_index_update >> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >> ->write_page >> ->ploop_map_wb_complete >>->ploop_wb_complete_post_process >> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >> E_RELOC_NULLIFY: >> >> ->submit() >> >> BUG#2: currecntly kaio write_page silently ignores REQ_FUA > > Sorry, I can't agree, it actually does not ignore: I've misstyped. I ment to say REQ_FLUSH. > >> static void >> kaio_write_page(struct ploop_io * io, struct ploop_request * preq, >> struct page * page, sector_t sec, int fua) >> { >> /* No FUA in kaio, convert it to fsync */ >> if (fua) >> set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state); > > >> BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all >> bios via REQ_FUA >> not just latest one. > > No need to tag *all*. See inline comments below. > >> This patch unify barrier handling like follows: >> - Get rid of FORCE_{FLUSH,FUA} >> - Introduce DELAYED_FLUSH, currecntly it supported only by io_direct >> - fix up fua handling for dio_submit >> >> This makes reloc sequence optimal: >> io_direct >> RELOC_S: R1, W2, WBI:FLUSH|FUA >> RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA >> io_kaio >> RELOC_S: R1, W2:FUA, WBI:FUA >> RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA >> >> https://jira.sw.ru/browse/PSBM-47107 >> Signed-off-by: Dmitry Monakhov >> --- >> drivers/block/ploop/dev.c | 8 +--- >> drivers/block/ploop/io_direct.c | 29 +- >> drivers/block/ploop/io_kaio.c | 17 ++-- >> drivers/block/ploop/map.c | 45 >> ++--- >> include/linux/ploop/ploop.h | 8 >> 5 files changed, 54 insertions(+), 53 deletions(-) >> >> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c >> index 96f7850..fbc5f2f 100644 >> --- a/drivers/block/ploop/dev.c >> +++ b/drivers/block/ploop/dev.c >> @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct >> ploop_request * preq) >> >> __TRACE("Z %p %u\n", preq, preq->req_cluster); >> >> +if (!preq->error) { >> +WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); >> +} >> while (preq->bl.head) { >> struct bio * bio = preq->bl.head; >> preq->bl.head = bio->bi_next; >> @@ -2530,9 +2533,8 @@ restart: >> top_delta = ploop_top_delta(plo); >> sbl.head = sbl.tail = preq->aux_bio; >> >> -/* Relocated data write required sync before BAT updatee */ >> -set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >> - >> +/* Relocated data write required sync before BAT updatee >> + * this will happen inside index_update */ >> if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { >> preq->eng_state = PLOOP_E_DATA_WBI; >> plo->st.bio_out++; >> diff --git a/drivers/block/ploop/io_direct.c >> b/drivers/block/ploop/io_direct.c >> index a6d83fe..d7ecd4a 100644 >> --- a/drivers/block/ploop/io_direct.c >> +++ b/drivers/block/ploop/io_direct.c >> @@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * >> preq, >> trace_submit(preq); >> >> preflush = !!(rw & REQ_FLUSH); >> - >> -if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state)) >> -preflush = 1; >> - >> -if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state)) >> -postfua = 1; >> - >> -if (!postfua && ploop_req_delay_fua_possible(rw, preq)) { >> - >> +postfua = !!(rw & REQ_FUA); >> +if (ploop_req_delay_fua_possible(rw, preq)) { >> /* Mark req that delayed flush required */ >> -set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state); >> -} else if (rw & REQ_FUA) { >> -postfua = 1; >> +set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state); >> +postfua = 0; >> } > > "postfua" is a horrible name, let us see if we can get rid of it > completely. Also, the way how ploop_req_delay_fua_possible implemented > is prone to err
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2
Dima, I agree with general approach of this patch, but there are some (easy-to-fix) issues. See, please, inline comments below... On 06/20/2016 11:58 AM, Dmitry Monakhov wrote: barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FUA Sorry, I can't agree, it actually does not ignore: static void kaio_write_page(struct ploop_io * io, struct ploop_request * preq, struct page * page, sector_t sec, int fua) { /* No FUA in kaio, convert it to fsync */ if (fua) set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state); BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. No need to tag *all*. See inline comments below. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH, currecntly it supported only by io_direct - fix up fua handling for dio_submit This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 29 +- drivers/block/ploop/io_kaio.c | 17 ++-- drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 8 5 files changed, 54 insertions(+), 53 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); + } while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..d7ecd4a 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, trace_submit(preq); preflush = !!(rw & REQ_FLUSH); - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state)) - preflush = 1; - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state)) - postfua = 1; - - if (!postfua && ploop_req_delay_fua_possible(rw, preq)) { - + postfua = !!(rw & REQ_FUA); + if (ploop_req_delay_fua_possible(rw, preq)) { /* Mark req that delayed flush required */ - set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state); - } else if (rw & REQ_FUA) { - postfua = 1; + set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state); + postfua = 0; } "postfua" is a horrible name, let us see if we can get rid of it completely. Also, the way how ploop_req_delay_fua_possible implemented is prone to errors (see below an issue in kaio_complete_io_state). Let's rework it like this: static inline bool ploop_req_delay_fua_possible(struct ploop_request *preq) { return preq->eng_state == PLOOP_E_DATA_WBI; } Then, that chunk in the dio_submit above might look as: /* If we can delay, mark req that delayed flush required */ if ((r
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
Dima, On 06/19/2016 06:06 AM, Dmitry Monakhov wrote: Maxim Patlasov writes: On 06/16/2016 09:30 AM, Dmitry Monakhov wrote: Dmitry Monakhov writes: Maxim Patlasov writes: Dima, I agree that the ploop barrier code is broken in many ways, but I don't think the patch actually fixes it. I hope you would agree that completion of REQ_FUA guarantees only landing that particular bio to the disk; it says nothing about flushing previously submitted (and completed) bio-s and it is also possible that power outage may catch us when this REQ_FUA is already landed to the disk, but previous bio-s are not yet. Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH. So yes. it would be more correct to tag WBI with FLUSH_FUA Hence, for RELOC_{A|S} requests we actually need something like that: RELOC_S: R1, W2, FLUSH:WB, WBI:FUA RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA (i.e. we do need to flush all previously submitted data before starting to update BAT on disk) Correct sequence: RELOC_S: R1, W2, WBI:FLUSH_FUA RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA not simply: RELOC_S: R1, W2, WBI:FUA RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we could remove them completely (along we that optimization delaying incoming FUA) and re-implement all this stuff from scratch: 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set REQ_FUA in preq->req_rw before calling ->submit(preq) 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for RELOC_A|S in ploop_index_update and map_wb_complete 3) For that optimization delaying incoming FUA (what we do now if ploop_req_delay_fua_possible() returns true) we could introduce new ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update and map_wb_complete (the same thing as 2) above). And, yes, let's WARN_ON if we somehow missed its processing. Yes. This was one of my ideas. 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce PLOOP_IO_FLUSH_DELAYED. 2) fix ->write_page to handle flush as it does with fua. The only complication I foresee is about how to teach kaio to pre-flush in kaio_write_page -- it's doable, but involves kaio_resubmit that's already pretty convoluted. Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH. Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req which is async and not suitable for page_write. I think it's doable to process page_write via kaio_fsync_thread, but it's tricky. Max let's make an agreement about terminology. The reason I wrote this is because linux internally interpret FUA as preflush,write,postflush which is wrong from academic point of view but it is the world we live it linux. Are you sure that this (FUA == preflush,write,postflush) is universally true (i.e. no exceptions)? What about bio-based block-device drivers? This is the reason I read code diferently from the way it was designed. Let's state that ploop is an ideal world where: FLUSH ==> preflush FUA ==> WRUTE,postflush In ideal word FUA is not obliged to be handled by postflush: it's enough to guarantee that *this* particular request went to platter, other requests may remain not-flushed-yet. Documentation/block/writeback_cache_control.txt is absolutely clear about it: The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure that I/O completion for this request is only signaled after the data has been committed to non-volatile storage. ... If the FUA bit is not natively supported the block layer turns it into an empty REQ_FLUSH request after the actual write. For what reasona we can perform reloc scheme as: RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA RELOC_A: R1,W2:FUA,WBI:FUA This allows effectively handle FUA and convert it to DELAYED_FLUSH where possible. Ploop has the concept of map_multi_updates. In short words, while you're handling one update, many others may come to PLOOP_E_INDEX_DELAY state. And, as soon as the first one is done, we modify many indices in one loop (see map_wb_complete), then write that page to disk only once. Having map_multi_update in mind, it may be suboptimal to make many W2:FUA-s -- it may be better to do many ordinary W2-s instead, and only one pre-FLUSH later -- when we're going to write BAT page on disk. Also let's clarify may_fua_delay semantics to exact eng_state may_fua_delay { int may_delay = 1; /* effectively this is equivalent of preq->eng_state != PLOOP_E_COMPLETE but it is more readable, and more error prone in future */ if (preq->eng_state != PLOOP_E_DATA_WBI) may_delay = 0 if ((test_bit(PLOOP_REQ_RELOC_S, &preq->state)) ||
[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2
barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FUA BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH, currecntly it supported only by io_direct - fix up fua handling for dio_submit This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 29 +- drivers/block/ploop/io_kaio.c | 17 ++-- drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 8 5 files changed, 54 insertions(+), 53 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state)); + } while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..d7ecd4a 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, trace_submit(preq); preflush = !!(rw & REQ_FLUSH); - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state)) - preflush = 1; - - if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state)) - postfua = 1; - - if (!postfua && ploop_req_delay_fua_possible(rw, preq)) { - + postfua = !!(rw & REQ_FUA); + if (ploop_req_delay_fua_possible(rw, preq)) { /* Mark req that delayed flush required */ - set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state); - } else if (rw & REQ_FUA) { - postfua = 1; + set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state); + postfua = 0; } - rw &= ~(REQ_FLUSH | REQ_FUA); @@ -238,14 +229,15 @@ flush_bio: rw2 |= REQ_FLUSH; preflush = 0; } - if (unlikely(postfua && !bl.head)) - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0)); + /* Very unlikely, but correct. +* TODO: Optimize postfua via DELAY_FLUSH for any req state */ + if (unlikely(!postfua)) + rw2 |= REQ_FUA; ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw); submit_bio(rw2, b); bio_num++; } - ploop_complete_io_request(preq); return; @@ -1520,15 +1512,14 @@ dio_read_page(struct ploop_io * io, struct ploop_request * preq, static void dio_write_page(struct ploop_io * io, struct ploop_request * preq, - struct page * page, sector_t sec, int fua) + struct page * page, sector_t sec, unsigned long rw) { if (!(io->files.file->f_mode
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
Maxim Patlasov writes: > On 06/16/2016 09:30 AM, Dmitry Monakhov wrote: >> Dmitry Monakhov writes: >> >>> Maxim Patlasov writes: >>> Dima, I agree that the ploop barrier code is broken in many ways, but I don't think the patch actually fixes it. I hope you would agree that completion of REQ_FUA guarantees only landing that particular bio to the disk; it says nothing about flushing previously submitted (and completed) bio-s and it is also possible that power outage may catch us when this REQ_FUA is already landed to the disk, but previous bio-s are not yet. >>> Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH. >>> So yes. it would be more correct to tag WBI with FLUSH_FUA Hence, for RELOC_{A|S} requests we actually need something like that: RELOC_S: R1, W2, FLUSH:WB, WBI:FUA RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA (i.e. we do need to flush all previously submitted data before starting to update BAT on disk) >>> Correct sequence: >>> RELOC_S: R1, W2, WBI:FLUSH_FUA >>> RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA >>> not simply: > RELOC_S: R1, W2, WBI:FUA > RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we could remove them completely (along we that optimization delaying incoming FUA) and re-implement all this stuff from scratch: 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set REQ_FUA in preq->req_rw before calling ->submit(preq) 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for RELOC_A|S in ploop_index_update and map_wb_complete 3) For that optimization delaying incoming FUA (what we do now if ploop_req_delay_fua_possible() returns true) we could introduce new ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update and map_wb_complete (the same thing as 2) above). And, yes, let's WARN_ON if we somehow missed its processing. >>> Yes. This was one of my ideas. >>> 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors >>> RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce >>> PLOOP_IO_FLUSH_DELAYED. >>> 2) fix ->write_page to handle flush as it does with fua. The only complication I foresee is about how to teach kaio to pre-flush in kaio_write_page -- it's doable, but involves kaio_resubmit that's already pretty convoluted. >>> Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH. >> Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req >> which is async and not suitable for page_write. > > I think it's doable to process page_write via kaio_fsync_thread, but > it's tricky. > >> Max let's make an agreement about terminology. >> The reason I wrote this is because linux internally interpret FUA as >> preflush,write,postflush which is wrong from academic point of view but >> it is the world we live it linux. > > Are you sure that this (FUA == preflush,write,postflush) is universally > true (i.e. no exceptions)? What about bio-based block-device drivers? > >> This is the reason I read code >> diferently from the way it was designed. >> Let's state that ploop is an ideal world where: >> FLUSH ==> preflush >> FUA ==> WRUTE,postflush > > In ideal word FUA is not obliged to be handled by postflush: it's enough > to guarantee that *this* particular request went to platter, other > requests may remain not-flushed-yet. > Documentation/block/writeback_cache_control.txt is absolutely clear > about it: > >> The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted >> from the >> filesystem and will make sure that I/O completion for this request is only >> signaled after the data has been committed to non-volatile storage. >> ... >> If the FUA bit is not natively supported the block >> layer turns it into an empty REQ_FLUSH request after the actual write. > > >> For what reasona we can perform reloc scheme as: >> >> RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA >> RELOC_A: R1,W2:FUA,WBI:FUA >> >> This allows effectively handle FUA and convert it to DELAYED_FLUSH where >> possible. > > Ploop has the concept of map_multi_updates. In short words, while you're > handling one update, many others may come to PLOOP_E_INDEX_DELAY state. > And, as soon as the first one is done, we modify many indices in one > loop (see map_wb_complete), then write that page to disk only once. > Having map_multi_update in mind, it may be suboptimal to make many > W2:FUA-s -- it may be better to do many ordinary W2-s instead, and only > one pre-FLUSH later -- when we're going to write BAT page on disk. > >> Also let's clarify may_fua_delay semantics to exact eng_stat
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
On 06/16/2016 09:30 AM, Dmitry Monakhov wrote: Dmitry Monakhov writes: Maxim Patlasov writes: Dima, I agree that the ploop barrier code is broken in many ways, but I don't think the patch actually fixes it. I hope you would agree that completion of REQ_FUA guarantees only landing that particular bio to the disk; it says nothing about flushing previously submitted (and completed) bio-s and it is also possible that power outage may catch us when this REQ_FUA is already landed to the disk, but previous bio-s are not yet. Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH. So yes. it would be more correct to tag WBI with FLUSH_FUA Hence, for RELOC_{A|S} requests we actually need something like that: RELOC_S: R1, W2, FLUSH:WB, WBI:FUA RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA (i.e. we do need to flush all previously submitted data before starting to update BAT on disk) Correct sequence: RELOC_S: R1, W2, WBI:FLUSH_FUA RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA not simply: RELOC_S: R1, W2, WBI:FUA RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we could remove them completely (along we that optimization delaying incoming FUA) and re-implement all this stuff from scratch: 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set REQ_FUA in preq->req_rw before calling ->submit(preq) 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for RELOC_A|S in ploop_index_update and map_wb_complete 3) For that optimization delaying incoming FUA (what we do now if ploop_req_delay_fua_possible() returns true) we could introduce new ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update and map_wb_complete (the same thing as 2) above). And, yes, let's WARN_ON if we somehow missed its processing. Yes. This was one of my ideas. 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce PLOOP_IO_FLUSH_DELAYED. 2) fix ->write_page to handle flush as it does with fua. The only complication I foresee is about how to teach kaio to pre-flush in kaio_write_page -- it's doable, but involves kaio_resubmit that's already pretty convoluted. Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH. Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req which is async and not suitable for page_write. I think it's doable to process page_write via kaio_fsync_thread, but it's tricky. Max let's make an agreement about terminology. The reason I wrote this is because linux internally interpret FUA as preflush,write,postflush which is wrong from academic point of view but it is the world we live it linux. Are you sure that this (FUA == preflush,write,postflush) is universally true (i.e. no exceptions)? What about bio-based block-device drivers? This is the reason I read code diferently from the way it was designed. Let's state that ploop is an ideal world where: FLUSH ==> preflush FUA ==> WRUTE,postflush In ideal word FUA is not obliged to be handled by postflush: it's enough to guarantee that *this* particular request went to platter, other requests may remain not-flushed-yet. Documentation/block/writeback_cache_control.txt is absolutely clear about it: The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure that I/O completion for this request is only signaled after the data has been committed to non-volatile storage. ... If the FUA bit is not natively supported the block layer turns it into an empty REQ_FLUSH request after the actual write. For what reasona we can perform reloc scheme as: RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA RELOC_A: R1,W2:FUA,WBI:FUA This allows effectively handle FUA and convert it to DELAYED_FLUSH where possible. Ploop has the concept of map_multi_updates. In short words, while you're handling one update, many others may come to PLOOP_E_INDEX_DELAY state. And, as soon as the first one is done, we modify many indices in one loop (see map_wb_complete), then write that page to disk only once. Having map_multi_update in mind, it may be suboptimal to make many W2:FUA-s -- it may be better to do many ordinary W2-s instead, and only one pre-FLUSH later -- when we're going to write BAT page on disk. Also let's clarify may_fua_delay semantics to exact eng_state may_fua_delay { int may_delay = 1; /* effectively this is equivalent of preq->eng_state != PLOOP_E_COMPLETE but it is more readable, and more error prone in future */ if (preq->eng_state != PLOOP_E_DATA_WBI) may_delay = 0 if ((test_bit(PLOOP_REQ_RELOC_S, &preq->state)) || (test_bit(PLOOP_REQ_RELOC_A, &preq->state))) may_delay
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
Dmitry Monakhov writes: > Maxim Patlasov writes: > >> Dima, >> >> I agree that the ploop barrier code is broken in many ways, but I don't >> think the patch actually fixes it. I hope you would agree that >> completion of REQ_FUA guarantees only landing that particular bio to the >> disk; it says nothing about flushing previously submitted (and >> completed) bio-s and it is also possible that power outage may catch us >> when this REQ_FUA is already landed to the disk, but previous bio-s are >> not yet. > Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH. > So yes. it would be more correct to tag WBI with FLUSH_FUA >> Hence, for RELOC_{A|S} requests we actually need something like that: >> >> RELOC_S: R1, W2, FLUSH:WB, WBI:FUA >> RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA >> >> (i.e. we do need to flush all previously submitted data before starting >> to update BAT on disk) >> > Correct sequence: > RELOC_S: R1, W2, WBI:FLUSH_FUA > RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA > >> not simply: >> >>> RELOC_S: R1, W2, WBI:FUA >>> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA >> >> Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and >> PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we >> could remove them completely (along we that optimization delaying >> incoming FUA) and re-implement all this stuff from scratch: >> >> 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set >> REQ_FUA in preq->req_rw before calling ->submit(preq) >> >> 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating >> BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for >> RELOC_A|S in ploop_index_update and map_wb_complete >> >> 3) For that optimization delaying incoming FUA (what we do now if >> ploop_req_delay_fua_possible() returns true) we could introduce new >> ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update >> and map_wb_complete (the same thing as 2) above). And, yes, let's >> WARN_ON if we somehow missed its processing. > Yes. This was one of my ideas. > 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors > RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce > PLOOP_IO_FLUSH_DELAYED. > 2) fix ->write_page to handle flush as it does with fua. >> >> The only complication I foresee is about how to teach kaio to pre-flush >> in kaio_write_page -- it's doable, but involves kaio_resubmit that's >> already pretty convoluted. >> > Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH. Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req which is async and not suitable for page_write. Max let's make an agreement about terminology. The reason I wrote this is because linux internally interpret FUA as preflush,write,postflush which is wrong from academic point of view but it is the world we live it linux. This is the reason I read code diferently from the way it was designed. Let's state that ploop is an ideal world where: FLUSH ==> preflush FUA ==> WRUTE,postflush For what reasona we can perform reloc scheme as: RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA RELOC_A: R1,W2:FUA,WBI:FUA This allows effectively handle FUA and convert it to DELAYED_FLUSH where possible. Also let's clarify may_fua_delay semantics to exact eng_state may_fua_delay { int may_delay = 1; /* effectively this is equivalent of preq->eng_state != PLOOP_E_COMPLETE but it is more readable, and more error prone in future */ if (preq->eng_state != PLOOP_E_DATA_WBI) may_delay = 0 if ((test_bit(PLOOP_REQ_RELOC_S, &preq->state)) || (test_bit(PLOOP_REQ_RELOC_A, &preq->state))) may_delay = 0; return may_delay; } k >> Btw, I accidentally noticed awful silly bug in kaio_complete_io_state(): >> we checks for REQ_FUA after clearing it! This makes all FUA-s on >> ordinary kaio_submit path silently lost... >> >> Thanks, >> Maxim >> >> >> On 06/15/2016 07:49 AM, Dmitry Monakhov wrote: >>> barrier code is broken in many ways: >>> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. >>> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and >>> write_page (for indexes) >>> So in case of grow_dev we have following sequance: >>> >>> E_RELOC_DATA_READ: >>> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >>>->delta->allocate >>> ->io->submit_allloc: dio_submit_alloc >>> ->dio_submit_pad >>> E_DATA_WBI : data written, time to update index >>>->delta->allocate_complete:ploop_index_update >>> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >>> ->write_page >>> ->ploop_map_wb_complete >>>->ploop_wb_complete_post_process >>> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >>> E_RELOC_NULLIFY: >>> >>>
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
Maxim Patlasov writes: > Dima, > > I agree that the ploop barrier code is broken in many ways, but I don't > think the patch actually fixes it. I hope you would agree that > completion of REQ_FUA guarantees only landing that particular bio to the > disk; it says nothing about flushing previously submitted (and > completed) bio-s and it is also possible that power outage may catch us > when this REQ_FUA is already landed to the disk, but previous bio-s are > not yet. Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH. So yes. it would be more correct to tag WBI with FLUSH_FUA > Hence, for RELOC_{A|S} requests we actually need something like that: > > RELOC_S: R1, W2, FLUSH:WB, WBI:FUA > RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA > > (i.e. we do need to flush all previously submitted data before starting > to update BAT on disk) > Correct sequence: RELOC_S: R1, W2, WBI:FLUSH_FUA RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA > not simply: > >> RELOC_S: R1, W2, WBI:FUA >> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA > > Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and > PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we > could remove them completely (along we that optimization delaying > incoming FUA) and re-implement all this stuff from scratch: > > 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set > REQ_FUA in preq->req_rw before calling ->submit(preq) > > 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating > BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for > RELOC_A|S in ploop_index_update and map_wb_complete > > 3) For that optimization delaying incoming FUA (what we do now if > ploop_req_delay_fua_possible() returns true) we could introduce new > ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update > and map_wb_complete (the same thing as 2) above). And, yes, let's > WARN_ON if we somehow missed its processing. Yes. This was one of my ideas. 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce PLOOP_IO_FLUSH_DELAYED. 2) fix ->write_page to handle flush as it does with fua. > > The only complication I foresee is about how to teach kaio to pre-flush > in kaio_write_page -- it's doable, but involves kaio_resubmit that's > already pretty convoluted. > Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH. > Btw, I accidentally noticed awful silly bug in kaio_complete_io_state(): > we checks for REQ_FUA after clearing it! This makes all FUA-s on > ordinary kaio_submit path silently lost... > > Thanks, > Maxim > > > On 06/15/2016 07:49 AM, Dmitry Monakhov wrote: >> barrier code is broken in many ways: >> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. >> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and >> write_page (for indexes) >> So in case of grow_dev we have following sequance: >> >> E_RELOC_DATA_READ: >> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >>->delta->allocate >> ->io->submit_allloc: dio_submit_alloc >> ->dio_submit_pad >> E_DATA_WBI : data written, time to update index >>->delta->allocate_complete:ploop_index_update >> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >> ->write_page >> ->ploop_map_wb_complete >>->ploop_wb_complete_post_process >> ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); >> E_RELOC_NULLIFY: >> >> ->submit() >> >> This patch unify barrier handling like follows: >> - Add assertation to ploop_complete_request for FORCE_{FLUSH,FUA} state >> - Perform explicit FUA inside index_update for RELOC requests. >> >> This makes reloc sequence optimal: >> RELOC_S: R1, W2, WBI:FUA >> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA >> >> https://jira.sw.ru/browse/PSBM-47107 >> Signed-off-by: Dmitry Monakhov >> --- >> drivers/block/ploop/dev.c | 10 +++--- >> drivers/block/ploop/map.c | 29 - >> 2 files changed, 19 insertions(+), 20 deletions(-) >> >> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c >> index 96f7850..998fe71 100644 >> --- a/drivers/block/ploop/dev.c >> +++ b/drivers/block/ploop/dev.c >> @@ -1224,6 +1224,11 @@ static void ploop_complete_request(struct >> ploop_request * preq) >> >> __TRACE("Z %p %u\n", preq, preq->req_cluster); >> >> +if (!preq->error) { >> +unsigned long state = READ_ONCE(preq->state); >> +WARN_ON(state & (1 << PLOOP_REQ_FORCE_FUA)); >> +WARN_ON(state & (1 <> +} >> while (preq->bl.head) { >> struct bio * bio = preq->bl.head; >> preq->bl.head = bio->bi_next; >> @@ -2530,9 +2535,8 @@ restart: >> top_delta =
Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
Dima, I agree that the ploop barrier code is broken in many ways, but I don't think the patch actually fixes it. I hope you would agree that completion of REQ_FUA guarantees only landing that particular bio to the disk; it says nothing about flushing previously submitted (and completed) bio-s and it is also possible that power outage may catch us when this REQ_FUA is already landed to the disk, but previous bio-s are not yet. Hence, for RELOC_{A|S} requests we actually need something like that: RELOC_S: R1, W2, FLUSH:WB, WBI:FUA RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA (i.e. we do need to flush all previously submitted data before starting to update BAT on disk) not simply: RELOC_S: R1, W2, WBI:FUA RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we could remove them completely (along we that optimization delaying incoming FUA) and re-implement all this stuff from scratch: 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set REQ_FUA in preq->req_rw before calling ->submit(preq) 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for RELOC_A|S in ploop_index_update and map_wb_complete 3) For that optimization delaying incoming FUA (what we do now if ploop_req_delay_fua_possible() returns true) we could introduce new ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update and map_wb_complete (the same thing as 2) above). And, yes, let's WARN_ON if we somehow missed its processing. The only complication I foresee is about how to teach kaio to pre-flush in kaio_write_page -- it's doable, but involves kaio_resubmit that's already pretty convoluted. Btw, I accidentally noticed awful silly bug in kaio_complete_io_state(): we checks for REQ_FUA after clearing it! This makes all FUA-s on ordinary kaio_submit path silently lost... Thanks, Maxim On 06/15/2016 07:49 AM, Dmitry Monakhov wrote: barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() This patch unify barrier handling like follows: - Add assertation to ploop_complete_request for FORCE_{FLUSH,FUA} state - Perform explicit FUA inside index_update for RELOC requests. This makes reloc sequence optimal: RELOC_S: R1, W2, WBI:FUA RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 10 +++--- drivers/block/ploop/map.c | 29 - 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..998fe71 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,11 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + unsigned long state = READ_ONCE(preq->state); + WARN_ON(state & (1 << PLOOP_REQ_FORCE_FUA)); + WARN_ON(state & (1bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2535,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c index 3a6365d..c17e598 100644 --- a/drivers/block/ploop/map.c +++ b/drivers/block/ploop/map.c @@ -896,6 +896,7 @@ void ploop_index_update(struct ploop_request * preq) struct ploop_device * plo = preq->plo; struct map_node * m = preq->map;
[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling
barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); E_RELOC_NULLIFY: ->submit() This patch unify barrier handling like follows: - Add assertation to ploop_complete_request for FORCE_{FLUSH,FUA} state - Perform explicit FUA inside index_update for RELOC requests. This makes reloc sequence optimal: RELOC_S: R1, W2, WBI:FUA RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/dev.c | 10 +++--- drivers/block/ploop/map.c | 29 - 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..998fe71 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,11 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); + if (!preq->error) { + unsigned long state = READ_ONCE(preq->state); + WARN_ON(state & (1 << PLOOP_REQ_FORCE_FUA)); + WARN_ON(state & (1bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2535,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; - /* Relocated data write required sync before BAT updatee */ - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - + /* Relocated data write required sync before BAT updatee +* this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c index 3a6365d..c17e598 100644 --- a/drivers/block/ploop/map.c +++ b/drivers/block/ploop/map.c @@ -896,6 +896,7 @@ void ploop_index_update(struct ploop_request * preq) struct ploop_device * plo = preq->plo; struct map_node * m = preq->map; struct ploop_delta * top_delta = map_top_delta(m->parent); + int fua = !!(preq->req_rw & REQ_FUA); u32 idx; map_index_t blk; int old_level; @@ -953,13 +954,13 @@ void ploop_index_update(struct ploop_request * preq) __TRACE("wbi %p %u %p\n", preq, preq->req_cluster, m); plo->st.map_single_writes++; top_delta->ops->map_index(top_delta, m->mn_start, &sec); - /* Relocate requires consistent writes, mark such reqs appropriately */ + /* Relocate requires consistent index update */ if (test_bit(PLOOP_REQ_RELOC_A, &preq->state) || test_bit(PLOOP_REQ_RELOC_S, &preq->state)) - set_bit(PLOOP_REQ_FORCE_FUA, &preq->state); - - top_delta->io.ops->write_page(&top_delta->io, preq, page, sec, - !!(preq->req_rw & REQ_FUA)); + fua = 1; + if (fua) + clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state); + top_delta->io.ops->write_page(&top_delta->io, preq, page, sec, fua); put_page(page); return; @@ -1078,7 +1079,7 @@ static void map_wb_complete(struct map_node * m, int err) int delayed = 0; unsigned int idx; sector_t sec; - int fua, force_fua; + int fua; /* First, complete processing of written back indices, * finally instantiate indices in mapping cache. @@ -1149,7 +1150,6 @@ static void map_wb_complete(struct map_node * m, int err) main_preq = NULL; fua = 0; - force_fua = 0; list_for_each_safe(cursor, tmp, &m->io_queue) { struct ploop_request * preq; @@ -1168,13 +1168,12 @@ static void map_wb_complete(struct map_node * m, int err) break; } - if (preq->req_rw & REQ_FUA) + if (preq->req_rw & REQ_FUA || + test_bit(PLOOP_REQ_RELOC_A, &preq->state) || + test_bit(PLOOP_REQ_RELOC_S, &preq->state)) { + clear_bit(PLOOP_REQ