Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3

2016-06-22 Thread Maxim Patlasov

Dima,

I'm uneasy that we still have handling RELOC_A|S broken. It seems we 
have full agreement that for such requests we can do unconditional 
FLUSH|FUA when we call write_page from ploop_index_update() and 
map_wb_complete(). And your idea to implement it by passing FLUSH|FUA 
for io_direct and post_fsync=1 for io_kaio is smart and OK. Will you 
send patch for that (fix barriers for RELOC_A|S requests)?


Thanks,
Maxim

On 06/21/2016 04:56 PM, Maxim Patlasov wrote:

Dima,

After more thinking I realized that the whole idea of 
PLOOP_REQ_DELAYED_FLUSH might be bogus: it is possible that we simply 
do not have many enough incoming FUA-s to make delaying lucrative. 
This patch actually mixes three things: 1) fix barriers for RELOC_A|S 
requests, 2) fix barriers for ordinary requests, 3) DELAYED_FLUSH 
optimization. So, please, split the patch into three and make some 
measurements demonstrating that applying "DELAYED_FLUSH optimization" 
patch on top of previous patches improves performance.


I have an idea about how to fix barriers for ordinary requests -- see 
please the patch I'll send soon. The key point is that handling 
FLUSH-es is broken the same way as FUA: if you observe (rw & 
REQ_FLUSH) and sends first bio marked as REQ_FLUSH, it guarantees 
nothing unless you wait for completion before submitting further 
bio-s! And ploop simply does not have the logic of waiting the first 
before sending others. And, to make things worse, not only dio_submit 
is affected, dio_sibmit_pad and dio_io_page to be fixed too.


There are also some inline comments below...

On 06/21/2016 06:55 AM, Dmitry Monakhov wrote:

barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} 
correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad 
and write_page (for indexes)

So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
   ->delta->allocate
  ->io->submit_allloc: dio_submit_alloc
->dio_submit_pad
E_DATA_WBI : data written, time to update index
->delta->allocate_complete:ploop_index_update
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
 ->write_page
 ->ploop_map_wb_complete
   ->ploop_wb_complete_post_process
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH
BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag 
all bios via REQ_FUA

not just latest one.
This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH
- fix fua handling for dio_submit
- BUG_ON for REQ_FLUSH in kaio_page_write

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/dev.c   |  8 +---
  drivers/block/ploop/io_direct.c | 30 ++-
  drivers/block/ploop/io_kaio.c   | 23 +
  drivers/block/ploop/map.c   | 45 
++---

  include/linux/ploop/ploop.h | 19 +
  5 files changed, 60 insertions(+), 65 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct 
ploop_request * preq)

__TRACE("Z %p %u\n", preq, preq->req_cluster);
  +if (!preq->error) {
+WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+}
  while (preq->bl.head) {
  struct bio * bio = preq->bl.head;
  preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
  top_delta = ploop_top_delta(plo);
  sbl.head = sbl.tail = preq->aux_bio;
  -/* Relocated data write required sync before BAT updatee */
-set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+/* Relocated data write required sync before BAT updatee
+ * this will happen inside index_update */
  if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
  preq->eng_state = PLOOP_E_DATA_WBI;
  plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c 
b/drivers/block/ploop/io_direct.c

index a6d83fe..303eb70 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct 
ploop_request * preq,

  int err;
  struct bio_list_walk bw;
  int preflush;
-int postfua = 0;
+int fua = 0;
  int write = !!(rw & REQ_WRITE);
  int bio_num;


Your patch obsolet

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3

2016-06-21 Thread Maxim Patlasov

Dima,

After more thinking I realized that the whole idea of 
PLOOP_REQ_DELAYED_FLUSH might be bogus: it is possible that we simply do 
not have many enough incoming FUA-s to make delaying lucrative. This 
patch actually mixes three things: 1) fix barriers for RELOC_A|S 
requests, 2) fix barriers for ordinary requests, 3) DELAYED_FLUSH 
optimization. So, please, split the patch into three and make some 
measurements demonstrating that applying "DELAYED_FLUSH optimization" 
patch on top of previous patches improves performance.


I have an idea about how to fix barriers for ordinary requests -- see 
please the patch I'll send soon. The key point is that handling FLUSH-es 
is broken the same way as FUA: if you observe (rw & REQ_FLUSH) and sends 
first bio marked as REQ_FLUSH, it guarantees nothing unless you wait for 
completion before submitting further bio-s! And ploop simply does not 
have the logic of waiting the first before sending others. And, to make 
things worse, not only dio_submit is affected, dio_sibmit_pad and 
dio_io_page to be fixed too.


There are also some inline comments below...

On 06/21/2016 06:55 AM, Dmitry Monakhov wrote:

barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
   ->delta->allocate
  ->io->submit_allloc: dio_submit_alloc
->dio_submit_pad
E_DATA_WBI : data written, time to update index
   ->delta->allocate_complete:ploop_index_update
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
 ->write_page
 ->ploop_map_wb_complete
   ->ploop_wb_complete_post_process
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH
BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios 
via REQ_FUA
not just latest one.
This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH
- fix fua handling for dio_submit
- BUG_ON for REQ_FLUSH in kaio_page_write

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/dev.c   |  8 +---
  drivers/block/ploop/io_direct.c | 30 ++-
  drivers/block/ploop/io_kaio.c   | 23 +
  drivers/block/ploop/map.c   | 45 ++---
  include/linux/ploop/ploop.h | 19 +
  5 files changed, 60 insertions(+), 65 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * 
preq)
  
  	__TRACE("Z %p %u\n", preq, preq->req_cluster);
  
+	if (!preq->error) {

+   WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+   }
while (preq->bl.head) {
struct bio * bio = preq->bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
  
-		/* Relocated data write required sync before BAT updatee */

-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index a6d83fe..303eb70 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
int err;
struct bio_list_walk bw;
int preflush;
-   int postfua = 0;
+   int fua = 0;
int write = !!(rw & REQ_WRITE);
int bio_num;


Your patch obsoletes bio_num. Please remove it.

  
  	trace_submit(preq);
  
  	preflush = !!(rw & REQ_FLUSH);

-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
-   preflush = 1;
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
-   postfua = 1;
-
-   if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
-
+   fua =

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2

2016-06-21 Thread Maxim Patlasov

On 06/21/2016 12:25 AM, Dmitry Monakhov wrote:

Maxim Patlasov  writes:


Dima,

I agree with general approach of this patch, but there are some
(easy-to-fix) issues. See, please, inline comments below...

On 06/20/2016 11:58 AM, Dmitry Monakhov wrote:

barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
   ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
->delta->allocate
   ->io->submit_allloc: dio_submit_alloc
 ->dio_submit_pad
E_DATA_WBI : data written, time to update index
->delta->allocate_complete:ploop_index_update
  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
  ->write_page
  ->ploop_map_wb_complete
->ploop_wb_complete_post_process
  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

 ->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FUA

Sorry, I can't agree, it actually does not ignore:

I've misstyped. I ment to say REQ_FLUSH.

static void
kaio_write_page(struct ploop_io * io, struct ploop_request * preq,
  struct page * page, sector_t sec, int fua)
{
 /* No FUA in kaio, convert it to fsync */
 if (fua)
 set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state);



BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios 
via REQ_FUA
 not just latest one.

No need to tag *all*. See inline comments below.


This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH, currecntly it supported only by io_direct
- fix up fua handling for dio_submit

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
   drivers/block/ploop/dev.c   |  8 +---
   drivers/block/ploop/io_direct.c | 29 +-
   drivers/block/ploop/io_kaio.c   | 17 ++--
   drivers/block/ploop/map.c   | 45 
++---
   include/linux/ploop/ploop.h |  8 
   5 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * 
preq)
   
   	__TRACE("Z %p %u\n", preq, preq->req_cluster);
   
+	if (!preq->error) {

+   WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+   }
while (preq->bl.head) {
struct bio * bio = preq->bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
   
-		/* Relocated data write required sync before BAT updatee */

-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index a6d83fe..d7ecd4a 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
trace_submit(preq);
   
   	preflush = !!(rw & REQ_FLUSH);

-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
-   preflush = 1;
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
-   postfua = 1;
-
-   if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
-
+   postfua = !!(rw & REQ_FUA);
+   if (ploop_req_delay_fua_possible(rw, preq)) {
/* Mark req that delayed flush required */
-   set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
-   } else if (rw & REQ_FUA) {
-   postfua = 1;
+   set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
+   postfua = 0;
}

"postfua" is a horrible name, let us see if we can get rid of it
completely. Also, the way how ploop_req_delay_fua_possible implemented
is prone to errors (see below an issue in kaio_complete_io_state). Let's
rework it like this:

Let it be postflush.

static inline bool ploop_req_delay_fua_possible(struct ploop_request
*preq)
{
 return preq->eng_state ==

[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3

2016-06-21 Thread Dmitry Monakhov
barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
  ->delta->allocate
 ->io->submit_allloc: dio_submit_alloc
   ->dio_submit_pad
E_DATA_WBI : data written, time to update index
  ->delta->allocate_complete:ploop_index_update
->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
->write_page
->ploop_map_wb_complete
  ->ploop_wb_complete_post_process
->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

   ->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH
BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios 
via REQ_FUA
   not just latest one.
This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH
- fix fua handling for dio_submit
- BUG_ON for REQ_FLUSH in kaio_page_write

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
 drivers/block/ploop/dev.c   |  8 +---
 drivers/block/ploop/io_direct.c | 30 ++-
 drivers/block/ploop/io_kaio.c   | 23 +
 drivers/block/ploop/map.c   | 45 ++---
 include/linux/ploop/ploop.h | 19 +
 5 files changed, 60 insertions(+), 65 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * 
preq)
 
__TRACE("Z %p %u\n", preq, preq->req_cluster);
 
+   if (!preq->error) {
+   WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+   }
while (preq->bl.head) {
struct bio * bio = preq->bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
 
-   /* Relocated data write required sync before BAT updatee */
-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index a6d83fe..303eb70 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
int err;
struct bio_list_walk bw;
int preflush;
-   int postfua = 0;
+   int fua = 0;
int write = !!(rw & REQ_WRITE);
int bio_num;
 
trace_submit(preq);
 
preflush = !!(rw & REQ_FLUSH);
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
-   preflush = 1;
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
-   postfua = 1;
-
-   if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
-
+   fua = !!(rw & REQ_FUA);
+   if (fua && ploop_req_delay_fua_possible(rw, preq)) {
/* Mark req that delayed flush required */
-   set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
-   } else if (rw & REQ_FUA) {
-   postfua = 1;
+   set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
+   fua = 0;
}
-
rw &= ~(REQ_FLUSH | REQ_FUA);
 
 
@@ -238,8 +229,10 @@ flush_bio:
rw2 |= REQ_FLUSH;
preflush = 0;
}
-   if (unlikely(postfua && !bl.head))
-   rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
+   /* Very unlikely, but correct.
+* TODO: Optimize postfua via DELAY_FLUSH for any req state */
+   if (unlikely(fua))
+   rw2 |= REQ_FUA;
 
ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
submit_bio(rw2, b);
@@ -1520,15 +1513,14 @@ dio_read_page(struct ploop_io * io, struct 
ploop_request * preq,
 
 static void
 dio_write_page(struct ploop_io * io, struct ploop_request * preq,
-  struct page * page, sector_t sec, int fua)
+  st

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2

2016-06-21 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> Dima,
>
> I agree with general approach of this patch, but there are some 
> (easy-to-fix) issues. See, please, inline comments below...
>
> On 06/20/2016 11:58 AM, Dmitry Monakhov wrote:
>> barrier code is broken in many ways:
>> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
>> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
>> write_page (for indexes)
>> So in case of grow_dev we have following sequance:
>>
>> E_RELOC_DATA_READ:
>>   ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>->delta->allocate
>>   ->io->submit_allloc: dio_submit_alloc
>> ->dio_submit_pad
>> E_DATA_WBI : data written, time to update index
>>->delta->allocate_complete:ploop_index_update
>>  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>  ->write_page
>>  ->ploop_map_wb_complete
>>->ploop_wb_complete_post_process
>>  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>> E_RELOC_NULLIFY:
>>
>> ->submit()
>>
>> BUG#2: currecntly kaio write_page silently ignores REQ_FUA
>
> Sorry, I can't agree, it actually does not ignore:
I've misstyped. I ment to say REQ_FLUSH.
>
>> static void
>> kaio_write_page(struct ploop_io * io, struct ploop_request * preq,
>>  struct page * page, sector_t sec, int fua)
>> {
>> /* No FUA in kaio, convert it to fsync */
>> if (fua)
>> set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state);
>
>
>> BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all 
>> bios via REQ_FUA
>> not just latest one.
>
> No need to tag *all*. See inline comments below.
>
>> This patch unify barrier handling like follows:
>> - Get rid of FORCE_{FLUSH,FUA}
>> - Introduce DELAYED_FLUSH, currecntly it supported only by io_direct
>> - fix up fua handling for dio_submit
>>
>> This makes reloc sequence optimal:
>> io_direct
>> RELOC_S: R1, W2, WBI:FLUSH|FUA
>> RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
>> io_kaio
>> RELOC_S: R1, W2:FUA, WBI:FUA
>> RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA
>>
>> https://jira.sw.ru/browse/PSBM-47107
>> Signed-off-by: Dmitry Monakhov 
>> ---
>>   drivers/block/ploop/dev.c   |  8 +---
>>   drivers/block/ploop/io_direct.c | 29 +-
>>   drivers/block/ploop/io_kaio.c   | 17 ++--
>>   drivers/block/ploop/map.c   | 45 
>> ++---
>>   include/linux/ploop/ploop.h |  8 
>>   5 files changed, 54 insertions(+), 53 deletions(-)
>>
>> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
>> index 96f7850..fbc5f2f 100644
>> --- a/drivers/block/ploop/dev.c
>> +++ b/drivers/block/ploop/dev.c
>> @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct 
>> ploop_request * preq)
>>   
>>  __TRACE("Z %p %u\n", preq, preq->req_cluster);
>>   
>> +if (!preq->error) {
>> +WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
>> +}
>>  while (preq->bl.head) {
>>  struct bio * bio = preq->bl.head;
>>  preq->bl.head = bio->bi_next;
>> @@ -2530,9 +2533,8 @@ restart:
>>  top_delta = ploop_top_delta(plo);
>>  sbl.head = sbl.tail = preq->aux_bio;
>>   
>> -/* Relocated data write required sync before BAT updatee */
>> -set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>> -
>> +/* Relocated data write required sync before BAT updatee
>> + * this will happen inside index_update */
>>  if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
>>  preq->eng_state = PLOOP_E_DATA_WBI;
>>  plo->st.bio_out++;
>> diff --git a/drivers/block/ploop/io_direct.c 
>> b/drivers/block/ploop/io_direct.c
>> index a6d83fe..d7ecd4a 100644
>> --- a/drivers/block/ploop/io_direct.c
>> +++ b/drivers/block/ploop/io_direct.c
>> @@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * 
>> preq,
>>  trace_submit(preq);
>>   
>>  preflush = !!(rw & REQ_FLUSH);
>> -
>> -if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
>> -preflush = 1;
>> -
>> -if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
>> -postfua = 1;
>> -
>> -if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
>> -
>> +postfua = !!(rw & REQ_FUA);
>> +if (ploop_req_delay_fua_possible(rw, preq)) {
>>  /* Mark req that delayed flush required */
>> -set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
>> -} else if (rw & REQ_FUA) {
>> -postfua = 1;
>> +set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
>> +postfua = 0;
>>  }
>
> "postfua" is a horrible name, let us see if we can get rid of it 
> completely. Also, the way how ploop_req_delay_fua_possible implemented 
> is prone to err

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2

2016-06-20 Thread Maxim Patlasov

Dima,

I agree with general approach of this patch, but there are some 
(easy-to-fix) issues. See, please, inline comments below...


On 06/20/2016 11:58 AM, Dmitry Monakhov wrote:

barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
   ->delta->allocate
  ->io->submit_allloc: dio_submit_alloc
->dio_submit_pad
E_DATA_WBI : data written, time to update index
   ->delta->allocate_complete:ploop_index_update
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
 ->write_page
 ->ploop_map_wb_complete
   ->ploop_wb_complete_post_process
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FUA


Sorry, I can't agree, it actually does not ignore:


static void
kaio_write_page(struct ploop_io * io, struct ploop_request * preq,
 struct page * page, sector_t sec, int fua)
{
/* No FUA in kaio, convert it to fsync */
if (fua)
set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state);




BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios 
via REQ_FUA
not just latest one.


No need to tag *all*. See inline comments below.


This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH, currecntly it supported only by io_direct
- fix up fua handling for dio_submit

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/dev.c   |  8 +---
  drivers/block/ploop/io_direct.c | 29 +-
  drivers/block/ploop/io_kaio.c   | 17 ++--
  drivers/block/ploop/map.c   | 45 ++---
  include/linux/ploop/ploop.h |  8 
  5 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * 
preq)
  
  	__TRACE("Z %p %u\n", preq, preq->req_cluster);
  
+	if (!preq->error) {

+   WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+   }
while (preq->bl.head) {
struct bio * bio = preq->bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
  
-		/* Relocated data write required sync before BAT updatee */

-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index a6d83fe..d7ecd4a 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
trace_submit(preq);
  
  	preflush = !!(rw & REQ_FLUSH);

-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
-   preflush = 1;
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
-   postfua = 1;
-
-   if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
-
+   postfua = !!(rw & REQ_FUA);
+   if (ploop_req_delay_fua_possible(rw, preq)) {
/* Mark req that delayed flush required */
-   set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
-   } else if (rw & REQ_FUA) {
-   postfua = 1;
+   set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
+   postfua = 0;
}


"postfua" is a horrible name, let us see if we can get rid of it 
completely. Also, the way how ploop_req_delay_fua_possible implemented 
is prone to errors (see below an issue in kaio_complete_io_state). Let's 
rework it like this:


static inline bool ploop_req_delay_fua_possible(struct ploop_request 
*preq)

{
return preq->eng_state == PLOOP_E_DATA_WBI;
}


Then, that chunk in the dio_submit above might look as:


/* If we can delay, mark req that delayed flush required */
if ((r

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-20 Thread Maxim Patlasov

Dima,

On 06/19/2016 06:06 AM, Dmitry Monakhov wrote:

Maxim Patlasov  writes:


On 06/16/2016 09:30 AM, Dmitry Monakhov wrote:

Dmitry Monakhov  writes:


Maxim Patlasov  writes:


Dima,

I agree that the ploop barrier code is broken in many ways, but I don't
think the patch actually fixes it. I hope you would agree that
completion of REQ_FUA guarantees only landing that particular bio to the
disk; it says nothing about flushing previously submitted (and
completed) bio-s and it is also possible that power outage may catch us
when this REQ_FUA is already landed to the disk, but previous bio-s are
not yet.

Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH.
So yes. it would be more correct to tag WBI with FLUSH_FUA

Hence, for RELOC_{A|S} requests we actually need something like that:

RELOC_S: R1, W2, FLUSH:WB, WBI:FUA
RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA

(i.e. we do need to flush all previously submitted data before starting
to update BAT on disk)


Correct sequence:
RELOC_S: R1, W2, WBI:FLUSH_FUA
RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA


not simply:


RELOC_S: R1, W2, WBI:FUA
RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA

Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and
PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we
could remove them completely (along we that optimization delaying
incoming FUA) and re-implement all this stuff from scratch:

1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set
REQ_FUA in preq->req_rw before calling ->submit(preq)

2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating
BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for
RELOC_A|S in ploop_index_update and map_wb_complete

3) For that optimization delaying incoming FUA (what we do now if
ploop_req_delay_fua_possible() returns true) we could introduce new
ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update
and map_wb_complete (the same thing as 2) above). And, yes, let's
WARN_ON if we somehow missed its processing.

Yes. This was one of my ideas.
1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors
RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce
PLOOP_IO_FLUSH_DELAYED.
2) fix ->write_page to handle flush as it does with fua.

The only complication I foresee is about how to teach kaio to pre-flush
in kaio_write_page -- it's doable, but involves kaio_resubmit that's
already pretty convoluted.


Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH.

Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req
which is async and not suitable for page_write.

I think it's doable to process page_write via kaio_fsync_thread, but
it's tricky.


Max let's make an agreement about terminology.
The reason I wrote this is because linux internally interpret FUA as
preflush,write,postflush which is wrong from academic point of view but
it is the world we live it linux.

Are you sure that this  (FUA == preflush,write,postflush) is universally
true (i.e. no exceptions)? What about bio-based block-device drivers?


This is the reason I read code
diferently from the way it was designed.
Let's state that ploop is an ideal world where:
FLUSH ==> preflush
FUA   ==> WRUTE,postflush

In ideal word FUA is not obliged to be handled by postflush: it's enough
to guarantee that *this* particular request went to platter, other
requests may remain not-flushed-yet.
Documentation/block/writeback_cache_control.txt is absolutely clear
about it:


The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted
from the
filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.
...
If the FUA bit is not natively supported the block
layer turns it into an empty REQ_FLUSH request after the actual write.



For what reasona we can perform reloc scheme as:

RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA
RELOC_A: R1,W2:FUA,WBI:FUA

This allows effectively handle FUA and convert it to DELAYED_FLUSH where
possible.

Ploop has the concept of map_multi_updates. In short words, while you're
handling one update, many others may come to PLOOP_E_INDEX_DELAY state.
And, as soon as the first one is done, we modify many indices in one
loop (see map_wb_complete), then write that page to disk only once.
Having map_multi_update in mind, it may be suboptimal to make many
W2:FUA-s -- it may be better to do many ordinary W2-s instead, and only
one pre-FLUSH later -- when we're going to write BAT page on disk.


Also let's clarify may_fua_delay semantics to exact eng_state

may_fua_delay {

int may_delay = 1;
/* effectively this is equivalent of
   preq->eng_state != PLOOP_E_COMPLETE
   but it is more readable, and more error prone in future
*/
if (preq->eng_state != PLOOP_E_DATA_WBI)
may_delay = 0
if ((test_bit(PLOOP_REQ_RELOC_S, &preq->state)) ||

[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2

2016-06-20 Thread Dmitry Monakhov
barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
  ->delta->allocate
 ->io->submit_allloc: dio_submit_alloc
   ->dio_submit_pad
E_DATA_WBI : data written, time to update index
  ->delta->allocate_complete:ploop_index_update
->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
->write_page
->ploop_map_wb_complete
  ->ploop_wb_complete_post_process
->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

   ->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FUA
BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios 
via REQ_FUA
   not just latest one.
This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH, currecntly it supported only by io_direct
- fix up fua handling for dio_submit

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
 drivers/block/ploop/dev.c   |  8 +---
 drivers/block/ploop/io_direct.c | 29 +-
 drivers/block/ploop/io_kaio.c   | 17 ++--
 drivers/block/ploop/map.c   | 45 ++---
 include/linux/ploop/ploop.h |  8 
 5 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * 
preq)
 
__TRACE("Z %p %u\n", preq, preq->req_cluster);
 
+   if (!preq->error) {
+   WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
+   }
while (preq->bl.head) {
struct bio * bio = preq->bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
 
-   /* Relocated data write required sync before BAT updatee */
-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index a6d83fe..d7ecd4a 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
trace_submit(preq);
 
preflush = !!(rw & REQ_FLUSH);
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
-   preflush = 1;
-
-   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
-   postfua = 1;
-
-   if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
-
+   postfua = !!(rw & REQ_FUA);
+   if (ploop_req_delay_fua_possible(rw, preq)) {
/* Mark req that delayed flush required */
-   set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
-   } else if (rw & REQ_FUA) {
-   postfua = 1;
+   set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
+   postfua = 0;
}
-
rw &= ~(REQ_FLUSH | REQ_FUA);
 
 
@@ -238,14 +229,15 @@ flush_bio:
rw2 |= REQ_FLUSH;
preflush = 0;
}
-   if (unlikely(postfua && !bl.head))
-   rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
+   /* Very unlikely, but correct.
+* TODO: Optimize postfua via DELAY_FLUSH for any req state */
+   if (unlikely(!postfua))
+   rw2 |= REQ_FUA;
 
ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
submit_bio(rw2, b);
bio_num++;
}
-
ploop_complete_io_request(preq);
return;
 
@@ -1520,15 +1512,14 @@ dio_read_page(struct ploop_io * io, struct 
ploop_request * preq,
 
 static void
 dio_write_page(struct ploop_io * io, struct ploop_request * preq,
-  struct page * page, sector_t sec, int fua)
+  struct page * page, sector_t sec, unsigned long rw)
 {
if (!(io->files.file->f_mode

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-19 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> On 06/16/2016 09:30 AM, Dmitry Monakhov wrote:
>> Dmitry Monakhov  writes:
>>
>>> Maxim Patlasov  writes:
>>>
 Dima,

 I agree that the ploop barrier code is broken in many ways, but I don't
 think the patch actually fixes it. I hope you would agree that
 completion of REQ_FUA guarantees only landing that particular bio to the
 disk; it says nothing about flushing previously submitted (and
 completed) bio-s and it is also possible that power outage may catch us
 when this REQ_FUA is already landed to the disk, but previous bio-s are
 not yet.
>>> Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH.
>>> So yes. it would be more correct to tag WBI with FLUSH_FUA
 Hence, for RELOC_{A|S} requests we actually need something like that:

RELOC_S: R1, W2, FLUSH:WB, WBI:FUA
RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA

 (i.e. we do need to flush all previously submitted data before starting
 to update BAT on disk)

>>> Correct sequence:
>>> RELOC_S: R1, W2, WBI:FLUSH_FUA
>>> RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA
>>>
 not simply:

> RELOC_S: R1, W2, WBI:FUA
> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA
 Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and
 PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we
 could remove them completely (along we that optimization delaying
 incoming FUA) and re-implement all this stuff from scratch:

 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set
 REQ_FUA in preq->req_rw before calling ->submit(preq)

 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating
 BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for
 RELOC_A|S in ploop_index_update and map_wb_complete

 3) For that optimization delaying incoming FUA (what we do now if
 ploop_req_delay_fua_possible() returns true) we could introduce new
 ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update
 and map_wb_complete (the same thing as 2) above). And, yes, let's
 WARN_ON if we somehow missed its processing.
>>> Yes. This was one of my ideas.
>>> 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors
>>> RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce
>>> PLOOP_IO_FLUSH_DELAYED.
>>> 2) fix ->write_page to handle flush as it does with fua.
 The only complication I foresee is about how to teach kaio to pre-flush
 in kaio_write_page -- it's doable, but involves kaio_resubmit that's
 already pretty convoluted.

>>> Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH.
>> Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req
>> which is async and not suitable for page_write.
>
> I think it's doable to process page_write via kaio_fsync_thread, but 
> it's tricky.
>
>> Max let's make an agreement about terminology.
>> The reason I wrote this is because linux internally interpret FUA as
>> preflush,write,postflush which is wrong from academic point of view but
>> it is the world we live it linux.
>
> Are you sure that this  (FUA == preflush,write,postflush) is universally 
> true (i.e. no exceptions)? What about bio-based block-device drivers?
>
>> This is the reason I read code
>> diferently from the way it was designed.
>> Let's state that ploop is an ideal world where:
>> FLUSH ==> preflush
>> FUA   ==> WRUTE,postflush
>
> In ideal word FUA is not obliged to be handled by postflush: it's enough 
> to guarantee that *this* particular request went to platter, other 
> requests may remain not-flushed-yet. 
> Documentation/block/writeback_cache_control.txt is absolutely clear 
> about it:
>
>> The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted 
>> from the
>> filesystem and will make sure that I/O completion for this request is only
>> signaled after the data has been committed to non-volatile storage.
>> ...
>> If the FUA bit is not natively supported the block
>> layer turns it into an empty REQ_FLUSH request after the actual write.
>
>
>> For what reasona we can perform reloc scheme as:
>>
>> RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA
>> RELOC_A: R1,W2:FUA,WBI:FUA
>>
>> This allows effectively handle FUA and convert it to DELAYED_FLUSH where
>> possible.
>
> Ploop has the concept of map_multi_updates. In short words, while you're 
> handling one update, many others may come to PLOOP_E_INDEX_DELAY state. 
> And, as soon as the first one is done, we modify many indices in one 
> loop (see map_wb_complete), then write that page to disk only once. 
> Having map_multi_update in mind, it may be suboptimal to make many 
> W2:FUA-s -- it may be better to do many ordinary W2-s instead, and only 
> one pre-FLUSH later -- when we're going to write BAT page on disk.
>
>> Also let's clarify may_fua_delay semantics to exact eng_stat

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-16 Thread Maxim Patlasov

On 06/16/2016 09:30 AM, Dmitry Monakhov wrote:

Dmitry Monakhov  writes:


Maxim Patlasov  writes:


Dima,

I agree that the ploop barrier code is broken in many ways, but I don't
think the patch actually fixes it. I hope you would agree that
completion of REQ_FUA guarantees only landing that particular bio to the
disk; it says nothing about flushing previously submitted (and
completed) bio-s and it is also possible that power outage may catch us
when this REQ_FUA is already landed to the disk, but previous bio-s are
not yet.

Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH.
So yes. it would be more correct to tag WBI with FLUSH_FUA

Hence, for RELOC_{A|S} requests we actually need something like that:

   RELOC_S: R1, W2, FLUSH:WB, WBI:FUA
   RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA

(i.e. we do need to flush all previously submitted data before starting
to update BAT on disk)


Correct sequence:
RELOC_S: R1, W2, WBI:FLUSH_FUA
RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA


not simply:


RELOC_S: R1, W2, WBI:FUA
RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA

Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and
PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we
could remove them completely (along we that optimization delaying
incoming FUA) and re-implement all this stuff from scratch:

1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set
REQ_FUA in preq->req_rw before calling ->submit(preq)

2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating
BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for
RELOC_A|S in ploop_index_update and map_wb_complete

3) For that optimization delaying incoming FUA (what we do now if
ploop_req_delay_fua_possible() returns true) we could introduce new
ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update
and map_wb_complete (the same thing as 2) above). And, yes, let's
WARN_ON if we somehow missed its processing.

Yes. This was one of my ideas.
1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors
RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce
PLOOP_IO_FLUSH_DELAYED.
2) fix ->write_page to handle flush as it does with fua.

The only complication I foresee is about how to teach kaio to pre-flush
in kaio_write_page -- it's doable, but involves kaio_resubmit that's
already pretty convoluted.


Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH.

Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req
which is async and not suitable for page_write.


I think it's doable to process page_write via kaio_fsync_thread, but 
it's tricky.



Max let's make an agreement about terminology.
The reason I wrote this is because linux internally interpret FUA as
preflush,write,postflush which is wrong from academic point of view but
it is the world we live it linux.


Are you sure that this  (FUA == preflush,write,postflush) is universally 
true (i.e. no exceptions)? What about bio-based block-device drivers?



This is the reason I read code
diferently from the way it was designed.
Let's state that ploop is an ideal world where:
FLUSH ==> preflush
FUA   ==> WRUTE,postflush


In ideal word FUA is not obliged to be handled by postflush: it's enough 
to guarantee that *this* particular request went to platter, other 
requests may remain not-flushed-yet. 
Documentation/block/writeback_cache_control.txt is absolutely clear 
about it:


The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted 
from the

filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.
...
If the FUA bit is not natively supported the block
layer turns it into an empty REQ_FLUSH request after the actual write.




For what reasona we can perform reloc scheme as:

RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA
RELOC_A: R1,W2:FUA,WBI:FUA

This allows effectively handle FUA and convert it to DELAYED_FLUSH where
possible.


Ploop has the concept of map_multi_updates. In short words, while you're 
handling one update, many others may come to PLOOP_E_INDEX_DELAY state. 
And, as soon as the first one is done, we modify many indices in one 
loop (see map_wb_complete), then write that page to disk only once. 
Having map_multi_update in mind, it may be suboptimal to make many 
W2:FUA-s -- it may be better to do many ordinary W2-s instead, and only 
one pre-FLUSH later -- when we're going to write BAT page on disk.



Also let's clarify may_fua_delay semantics to exact eng_state

may_fua_delay {

   int may_delay = 1;
   /* effectively this is equivalent of
  preq->eng_state != PLOOP_E_COMPLETE
  but it is more readable, and more error prone in future
   */
   if (preq->eng_state != PLOOP_E_DATA_WBI)
   may_delay = 0
   if ((test_bit(PLOOP_REQ_RELOC_S, &preq->state)) ||
  (test_bit(PLOOP_REQ_RELOC_A, &preq->state)))
   may_delay 

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-16 Thread Dmitry Monakhov
Dmitry Monakhov  writes:

> Maxim Patlasov  writes:
>
>> Dima,
>>
>> I agree that the ploop barrier code is broken in many ways, but I don't 
>> think the patch actually fixes it. I hope you would agree that 
>> completion of REQ_FUA guarantees only landing that particular bio to the 
>> disk; it says nothing about flushing previously submitted (and 
>> completed) bio-s and it is also possible that power outage may catch us 
>> when this REQ_FUA is already landed to the disk, but previous bio-s are 
>> not yet.
> Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH.
> So yes. it would be more correct to tag WBI with FLUSH_FUA
>> Hence, for RELOC_{A|S} requests we actually need something like that:
>>
>>   RELOC_S: R1, W2, FLUSH:WB, WBI:FUA
>>   RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA
>>
>> (i.e. we do need to flush all previously submitted data before starting 
>> to update BAT on disk)
>>
> Correct sequence:
> RELOC_S: R1, W2, WBI:FLUSH_FUA
> RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA
>
>> not simply:
>>
>>> RELOC_S: R1, W2, WBI:FUA
>>> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA
>>
>> Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and 
>> PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we 
>> could remove them completely (along we that optimization delaying 
>> incoming FUA) and re-implement all this stuff from scratch:
>>
>> 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set 
>> REQ_FUA in preq->req_rw before calling ->submit(preq)
>>
>> 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating 
>> BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for 
>> RELOC_A|S in ploop_index_update and map_wb_complete
>>
>> 3) For that optimization delaying incoming FUA (what we do now if 
>> ploop_req_delay_fua_possible() returns true) we could introduce new 
>> ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update 
>> and map_wb_complete (the same thing as 2) above). And, yes, let's 
>> WARN_ON if we somehow missed its processing.
> Yes. This was one of my ideas.
> 1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors
> RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce
> PLOOP_IO_FLUSH_DELAYED.
> 2) fix ->write_page to handle flush as it does with fua.
>>
>> The only complication I foresee is about how to teach kaio to pre-flush 
>> in kaio_write_page -- it's doable, but involves kaio_resubmit that's 
>> already pretty convoluted.
>>
> Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH.
Crap. Currently kaio can handles fsync only via kaio_queue_fsync_req
which is async and not suitable for page_write.
Max let's make an agreement about terminology.
The reason I wrote this is because linux internally interpret FUA as
preflush,write,postflush which is wrong from academic point of view but
it is the world we live it linux. This is the reason I read code
diferently from the way it was designed.
Let's state that ploop is an ideal world where:
FLUSH ==> preflush
FUA   ==> WRUTE,postflush
For what reasona we can perform reloc scheme as:

RELOC_A: R1,W2:FUA,WBI:FUA,W1:NULLIFY|FUA
RELOC_A: R1,W2:FUA,WBI:FUA

This allows effectively handle FUA and convert it to DELAYED_FLUSH where
possible. Also let's clarify may_fua_delay semantics to exact eng_state

may_fua_delay {

  int may_delay = 1;
  /* effectively this is equivalent of
 preq->eng_state != PLOOP_E_COMPLETE
 but it is more readable, and more error prone in future
  */
  if (preq->eng_state != PLOOP_E_DATA_WBI)
  may_delay = 0
  if ((test_bit(PLOOP_REQ_RELOC_S, &preq->state)) ||
 (test_bit(PLOOP_REQ_RELOC_A, &preq->state)))
  may_delay = 0;
  return may_delay;
}





k
>> Btw, I accidentally noticed awful silly bug in kaio_complete_io_state(): 
>> we checks for REQ_FUA after clearing it! This makes all FUA-s on 
>> ordinary kaio_submit path silently lost...
>>
>> Thanks,
>> Maxim
>>
>>
>> On 06/15/2016 07:49 AM, Dmitry Monakhov wrote:
>>> barrier code is broken in many ways:
>>> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
>>> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
>>> write_page (for indexes)
>>> So in case of grow_dev we have following sequance:
>>>
>>> E_RELOC_DATA_READ:
>>>   ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>>->delta->allocate
>>>   ->io->submit_allloc: dio_submit_alloc
>>> ->dio_submit_pad
>>> E_DATA_WBI : data written, time to update index
>>>->delta->allocate_complete:ploop_index_update
>>>  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>>  ->write_page
>>>  ->ploop_map_wb_complete
>>>->ploop_wb_complete_post_process
>>>  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>> E_RELOC_NULLIFY:
>>>
>>> 

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-16 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> Dima,
>
> I agree that the ploop barrier code is broken in many ways, but I don't 
> think the patch actually fixes it. I hope you would agree that 
> completion of REQ_FUA guarantees only landing that particular bio to the 
> disk; it says nothing about flushing previously submitted (and 
> completed) bio-s and it is also possible that power outage may catch us 
> when this REQ_FUA is already landed to the disk, but previous bio-s are 
> not yet.
Actually it does (but implicitly) linux handles FUA as FLUSH,W,FLUSH.
So yes. it would be more correct to tag WBI with FLUSH_FUA
> Hence, for RELOC_{A|S} requests we actually need something like that:
>
>   RELOC_S: R1, W2, FLUSH:WB, WBI:FUA
>   RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA
>
> (i.e. we do need to flush all previously submitted data before starting 
> to update BAT on disk)
>
Correct sequence:
RELOC_S: R1, W2, WBI:FLUSH_FUA
RELOC_A: R1, W2, WBI:FLUSH_FUA, W1:NULLIFY:FUA

> not simply:
>
>> RELOC_S: R1, W2, WBI:FUA
>> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA
>
> Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and 
> PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we 
> could remove them completely (along we that optimization delaying 
> incoming FUA) and re-implement all this stuff from scratch:
>
> 1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set 
> REQ_FUA in preq->req_rw before calling ->submit(preq)
>
> 2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating 
> BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for 
> RELOC_A|S in ploop_index_update and map_wb_complete
>
> 3) For that optimization delaying incoming FUA (what we do now if 
> ploop_req_delay_fua_possible() returns true) we could introduce new 
> ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update 
> and map_wb_complete (the same thing as 2) above). And, yes, let's 
> WARN_ON if we somehow missed its processing.
Yes. This was one of my ideas.
1)FORCE_FLUSH, FORCE_FUA are redundant states which simply mirrors
RELOC_{A,S} semantics. Lets get rid of that crap and simply introduce
PLOOP_IO_FLUSH_DELAYED.
2) fix ->write_page to handle flush as it does with fua.
>
> The only complication I foresee is about how to teach kaio to pre-flush 
> in kaio_write_page -- it's doable, but involves kaio_resubmit that's 
> already pretty convoluted.
>
Yes. kio_submit is correct, but kaio_write_page do not care about REQ_FLUSH.
> Btw, I accidentally noticed awful silly bug in kaio_complete_io_state(): 
> we checks for REQ_FUA after clearing it! This makes all FUA-s on 
> ordinary kaio_submit path silently lost...
>
> Thanks,
> Maxim
>
>
> On 06/15/2016 07:49 AM, Dmitry Monakhov wrote:
>> barrier code is broken in many ways:
>> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
>> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
>> write_page (for indexes)
>> So in case of grow_dev we have following sequance:
>>
>> E_RELOC_DATA_READ:
>>   ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>->delta->allocate
>>   ->io->submit_allloc: dio_submit_alloc
>> ->dio_submit_pad
>> E_DATA_WBI : data written, time to update index
>>->delta->allocate_complete:ploop_index_update
>>  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>  ->write_page
>>  ->ploop_map_wb_complete
>>->ploop_wb_complete_post_process
>>  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>> E_RELOC_NULLIFY:
>>
>> ->submit()
>>
>> This patch unify barrier handling like follows:
>> - Add assertation to ploop_complete_request for FORCE_{FLUSH,FUA} state
>> - Perform explicit FUA inside index_update for RELOC requests.
>>
>> This makes reloc sequence optimal:
>> RELOC_S: R1, W2, WBI:FUA
>> RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA
>>
>> https://jira.sw.ru/browse/PSBM-47107
>> Signed-off-by: Dmitry Monakhov 
>> ---
>>   drivers/block/ploop/dev.c | 10 +++---
>>   drivers/block/ploop/map.c | 29 -
>>   2 files changed, 19 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
>> index 96f7850..998fe71 100644
>> --- a/drivers/block/ploop/dev.c
>> +++ b/drivers/block/ploop/dev.c
>> @@ -1224,6 +1224,11 @@ static void ploop_complete_request(struct 
>> ploop_request * preq)
>>   
>>  __TRACE("Z %p %u\n", preq, preq->req_cluster);
>>   
>> +if (!preq->error) {
>> +unsigned long state =  READ_ONCE(preq->state);
>> +WARN_ON(state & (1 << PLOOP_REQ_FORCE_FUA));
>> +WARN_ON(state & (1 <> +}
>>  while (preq->bl.head) {
>>  struct bio * bio = preq->bl.head;
>>  preq->bl.head = bio->bi_next;
>> @@ -2530,9 +2535,8 @@ restart:
>>  top_delta =

Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-15 Thread Maxim Patlasov

Dima,

I agree that the ploop barrier code is broken in many ways, but I don't 
think the patch actually fixes it. I hope you would agree that 
completion of REQ_FUA guarantees only landing that particular bio to the 
disk; it says nothing about flushing previously submitted (and 
completed) bio-s and it is also possible that power outage may catch us 
when this REQ_FUA is already landed to the disk, but previous bio-s are 
not yet.


Hence, for RELOC_{A|S} requests we actually need something like that:

 RELOC_S: R1, W2, FLUSH:WB, WBI:FUA
 RELOC_A: R1, W2, FLUSH:WB, WBI:FUA, W1:NULLIFY:FUA

(i.e. we do need to flush all previously submitted data before starting 
to update BAT on disk)


not simply:


RELOC_S: R1, W2, WBI:FUA
RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA


Also, the patch makes the meaning of PLOOP_REQ_FORCE_FUA and 
PLOOP_REQ_FORCE_FLUSH even more obscure than it used to be. I think we 
could remove them completely (along we that optimization delaying 
incoming FUA) and re-implement all this stuff from scratch:


1) The final "NULLIFY:FUA" is a peace of cake -- it's enough to set 
REQ_FUA in preq->req_rw before calling ->submit(preq)


2) For "FLUSH:WB, WBI:FUA" it is actually enough to send bio updating 
BAT on disk as REQ_FLUSH|REQ_FUA -- we can specify it explicitly for 
RELOC_A|S in ploop_index_update and map_wb_complete


3) For that optimization delaying incoming FUA (what we do now if 
ploop_req_delay_fua_possible() returns true) we could introduce new 
ad-hoc PLOOP_IO_FLUSH_DELAYED enforcing REQ_FLUSH in ploop_index_update 
and map_wb_complete (the same thing as 2) above). And, yes, let's 
WARN_ON if we somehow missed its processing.


The only complication I foresee is about how to teach kaio to pre-flush 
in kaio_write_page -- it's doable, but involves kaio_resubmit that's 
already pretty convoluted.


Btw, I accidentally noticed awful silly bug in kaio_complete_io_state(): 
we checks for REQ_FUA after clearing it! This makes all FUA-s on 
ordinary kaio_submit path silently lost...


Thanks,
Maxim


On 06/15/2016 07:49 AM, Dmitry Monakhov wrote:

barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
  ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
   ->delta->allocate
  ->io->submit_allloc: dio_submit_alloc
->dio_submit_pad
E_DATA_WBI : data written, time to update index
   ->delta->allocate_complete:ploop_index_update
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
 ->write_page
 ->ploop_map_wb_complete
   ->ploop_wb_complete_post_process
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

->submit()

This patch unify barrier handling like follows:
- Add assertation to ploop_complete_request for FORCE_{FLUSH,FUA} state
- Perform explicit FUA inside index_update for RELOC requests.

This makes reloc sequence optimal:
RELOC_S: R1, W2, WBI:FUA
RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/dev.c | 10 +++---
  drivers/block/ploop/map.c | 29 -
  2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..998fe71 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,11 @@ static void ploop_complete_request(struct ploop_request 
* preq)
  
  	__TRACE("Z %p %u\n", preq, preq->req_cluster);
  
+	if (!preq->error) {

+   unsigned long state =  READ_ONCE(preq->state);
+   WARN_ON(state & (1 << PLOOP_REQ_FORCE_FUA));
+   WARN_ON(state & (1 bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2535,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
  
-		/* Relocated data write required sync before BAT updatee */

-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c
index 3a6365d..c17e598 100644
--- a/drivers/block/ploop/map.c
+++ b/drivers/block/ploop/map.c
@@ -896,6 +896,7 @@ void ploop_index_update(struct ploop_request * preq)
struct ploop_device * plo = preq->plo;
struct map_node * m = preq->map;
 

[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling

2016-06-15 Thread Dmitry Monakhov
barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad and 
write_page (for indexes)
So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
 ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
  ->delta->allocate
 ->io->submit_allloc: dio_submit_alloc
   ->dio_submit_pad
E_DATA_WBI : data written, time to update index
  ->delta->allocate_complete:ploop_index_update
->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
->write_page
->ploop_map_wb_complete
  ->ploop_wb_complete_post_process
->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
E_RELOC_NULLIFY:

   ->submit()

This patch unify barrier handling like follows:
- Add assertation to ploop_complete_request for FORCE_{FLUSH,FUA} state
- Perform explicit FUA inside index_update for RELOC requests.

This makes reloc sequence optimal:
RELOC_S: R1, W2, WBI:FUA
RELOC_A: R1, W2, WBI:FUA, W1:NULLIFY:FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
 drivers/block/ploop/dev.c | 10 +++---
 drivers/block/ploop/map.c | 29 -
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..998fe71 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,11 @@ static void ploop_complete_request(struct ploop_request 
* preq)
 
__TRACE("Z %p %u\n", preq, preq->req_cluster);
 
+   if (!preq->error) {
+   unsigned long state =  READ_ONCE(preq->state);
+   WARN_ON(state & (1 << PLOOP_REQ_FORCE_FUA));
+   WARN_ON(state & (1 bl.head;
preq->bl.head = bio->bi_next;
@@ -2530,9 +2535,8 @@ restart:
top_delta = ploop_top_delta(plo);
sbl.head = sbl.tail = preq->aux_bio;
 
-   /* Relocated data write required sync before BAT updatee */
-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
+   /* Relocated data write required sync before BAT updatee
+* this will happen inside index_update */
if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
preq->eng_state = PLOOP_E_DATA_WBI;
plo->st.bio_out++;
diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c
index 3a6365d..c17e598 100644
--- a/drivers/block/ploop/map.c
+++ b/drivers/block/ploop/map.c
@@ -896,6 +896,7 @@ void ploop_index_update(struct ploop_request * preq)
struct ploop_device * plo = preq->plo;
struct map_node * m = preq->map;
struct ploop_delta * top_delta = map_top_delta(m->parent);
+   int fua = !!(preq->req_rw & REQ_FUA);
u32 idx;
map_index_t blk;
int old_level;
@@ -953,13 +954,13 @@ void ploop_index_update(struct ploop_request * preq)
__TRACE("wbi %p %u %p\n", preq, preq->req_cluster, m);
plo->st.map_single_writes++;
top_delta->ops->map_index(top_delta, m->mn_start, &sec);
-   /* Relocate requires consistent writes, mark such reqs appropriately */
+   /* Relocate requires consistent index update */
if (test_bit(PLOOP_REQ_RELOC_A, &preq->state) ||
test_bit(PLOOP_REQ_RELOC_S, &preq->state))
-   set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
-
-   top_delta->io.ops->write_page(&top_delta->io, preq, page, sec,
- !!(preq->req_rw & REQ_FUA));
+   fua = 1;
+   if (fua)
+   clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
+   top_delta->io.ops->write_page(&top_delta->io, preq, page, sec, fua);
put_page(page);
return;
 
@@ -1078,7 +1079,7 @@ static void map_wb_complete(struct map_node * m, int err)
int delayed = 0;
unsigned int idx;
sector_t sec;
-   int fua, force_fua;
+   int fua;
 
/* First, complete processing of written back indices,
 * finally instantiate indices in mapping cache.
@@ -1149,7 +1150,6 @@ static void map_wb_complete(struct map_node * m, int err)
 
main_preq = NULL;
fua = 0;
-   force_fua = 0;
 
list_for_each_safe(cursor, tmp, &m->io_queue) {
struct ploop_request * preq;
@@ -1168,13 +1168,12 @@ static void map_wb_complete(struct map_node * m, int 
err)
break;
}
 
-   if (preq->req_rw & REQ_FUA)
+   if (preq->req_rw & REQ_FUA ||
+   test_bit(PLOOP_REQ_RELOC_A, &preq->state) ||
+   test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
+   clear_bit(PLOOP_REQ