Re: [PATCH v2 06/20] qemu_file: total_transferred is not used anymore

2023-06-22 Thread Peter Xu
On Thu, Jun 22, 2023 at 01:05:49AM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > On Tue, May 30, 2023 at 08:39:27PM +0200, Juan Quintela wrote:
> >> Signed-off-by: Juan Quintela 
> >> ---
> >>  migration/qemu-file.c | 4 
> >>  1 file changed, 4 deletions(-)
> >> 
> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> index eb0497e532..6b6deea19b 100644
> >> --- a/migration/qemu-file.c
> >> +++ b/migration/qemu-file.c
> >> @@ -41,9 +41,6 @@ struct QEMUFile {
> >>  QIOChannel *ioc;
> >>  bool is_writable;
> >>  
> >> -/* The sum of bytes transferred on the wire */
> >> -uint64_t total_transferred;
> >> -
> >>  int buf_index;
> >>  int buf_size; /* 0 when writing */
> >>  uint8_t buf[IO_BUF_SIZE];
> >> @@ -287,7 +284,6 @@ void qemu_fflush(QEMUFile *f)
> >>  qemu_file_set_error_obj(f, -EIO, local_error);
> >>  } else {
> >>  uint64_t size = iov_size(f->iov, f->iovcnt);
> >> -f->total_transferred += size;
> >
> > I think this patch is another example why I think sometimes the way patch
> > is split are pretty much adding more complexity on review...
> 
> It depends of taste.
> 
> You are doing one thing in way1.
> Then you find a better way to do it, lets call it way2.
> 
> Now we have two options to see how we arrived there.
> 
> a- You got any declarations/definition/initializations for way2
> b- You write way2 alongside way1
> c- You test that both ways give the same result, and you see that they
>give the same result.
> d- you remove the way1.
> 
> Or you squash the four patches in a single patch.  But then the reviewer
> lost the place where one can see why it is the same than the old one.

For this patch to remove total_transferred, IMHO as a reviewer it'll be
easier to me if it's put in the same patch where it got replaced.

It might be different if we're going to remove a large chunk of code, but
for this patch it's a few lines of change.

> 
> Sometimes is better the longer way, sometimes is better the short one.
> 
> Clearly we don't agree about what is the best way in this case.
> 
> > Here we removed a variable operation but it seems all fine if it's not used
> > anywhere.  But it also means current code base (before this patch applied)
> > doesn't make sense already because it contains this useless addition.  So
> > IMHO it means some previous patch does it just wrong.
> 
> No.  It is how it is developed.  And being respectful with the
> reviewer.  Given it enough information to do a proper review.

I never doubted about that.  I trust that you were trying to provide the
best for a reviewer and I appreciate that.

> 
> During the development of this series, there were lots of:
> 
> if (old_counter != new_counter)
>printf("");

I assume you meant when debugging?  I may not have read all the patches; I
hope we just don't do that if possible in any real git commit.

> 
> traces were in the several thousand lines long.  If I have to review
> that change, I would love any help that writer can give me.  That is why
> it is done this way.

Yeah, I think it's probably that even from reviewers' perspective the need
can be different from individuals.

> 
> > I think it means it's caused by a wrong split of patches, then each patch
> > stops to make much sense as a standalone one.
> 
> It stops making sense if you want each feature to be a single patch.
> Before the patch no feature.  After the patch full feature.  That brings
> us to very long patches.
> 
> What is easier to review (to do the same)
> 
> a - 1 x 1000 lines patch
> b - 10 x 100 lines patch
> 
> I will go with b any time.  Except if the split is arbitrary.

AFAIU, this is a different thing.  I'm never against splitting patch, but
about how to split.  I was never asking for a 1000 LOC patch, right? :)

> 
> > I can go back and try to find whatever patch on the list that will explain
> > this.  But it'll also go into git log.  Anyone reads this later will be
> > confused once again.  Even harder for them to figure out what
> > happened.
> 
> As said before, I completely disagree here.  And what is worse.  If it
> gets wrong, with your approach git bisect will not help as much than
> with my appreach.
> 
> > Do you think we could reorganize the patches so each of a single patch
> > explains itself?
> 
> No.  See before.  We go for a very spaguetti code to a much less
> spaguety code.
> 
> > The other thing is about priority of patches - I still have ~80 patches
> > pending reviews on migration only.. Would you think it makes sense we pickg
> > up important ones first and merge them with higher priority?
> 
> Ok, lets make this clear.
> This whole atomic migration counters started because the zero_page
> detection in multifd had the counters so wrong that meassuring speed
> become impossible.
> 
> I haven't yet send the multifd zero pages.  And why was it so
> complicated.  Just on top of my memory.
> 
> - how much data had we transferred.  

Re: [PATCH v2 06/20] qemu_file: total_transferred is not used anymore

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Tue, May 30, 2023 at 08:39:27PM +0200, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>  migration/qemu-file.c | 4 
>>  1 file changed, 4 deletions(-)
>> 
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index eb0497e532..6b6deea19b 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -41,9 +41,6 @@ struct QEMUFile {
>>  QIOChannel *ioc;
>>  bool is_writable;
>>  
>> -/* The sum of bytes transferred on the wire */
>> -uint64_t total_transferred;
>> -
>>  int buf_index;
>>  int buf_size; /* 0 when writing */
>>  uint8_t buf[IO_BUF_SIZE];
>> @@ -287,7 +284,6 @@ void qemu_fflush(QEMUFile *f)
>>  qemu_file_set_error_obj(f, -EIO, local_error);
>>  } else {
>>  uint64_t size = iov_size(f->iov, f->iovcnt);
>> -f->total_transferred += size;
>
> I think this patch is another example why I think sometimes the way patch
> is split are pretty much adding more complexity on review...

It depends of taste.

You are doing one thing in way1.
Then you find a better way to do it, lets call it way2.

Now we have two options to see how we arrived there.

a- You got any declarations/definition/initializations for way2
b- You write way2 alongside way1
c- You test that both ways give the same result, and you see that they
   give the same result.
d- you remove the way1.

Or you squash the four patches in a single patch.  But then the reviewer
lost the place where one can see why it is the same than the old one.

Sometimes is better the longer way, sometimes is better the short one.

Clearly we don't agree about what is the best way in this case.

> Here we removed a variable operation but it seems all fine if it's not used
> anywhere.  But it also means current code base (before this patch applied)
> doesn't make sense already because it contains this useless addition.  So
> IMHO it means some previous patch does it just wrong.

No.  It is how it is developed.  And being respectful with the
reviewer.  Given it enough information to do a proper review.

During the development of this series, there were lots of:

if (old_counter != new_counter)
   printf("");

traces were in the several thousand lines long.  If I have to review
that change, I would love any help that writer can give me.  That is why
it is done this way.

> I think it means it's caused by a wrong split of patches, then each patch
> stops to make much sense as a standalone one.

It stops making sense if you want each feature to be a single patch.
Before the patch no feature.  After the patch full feature.  That brings
us to very long patches.

What is easier to review (to do the same)

a - 1 x 1000 lines patch
b - 10 x 100 lines patch

I will go with b any time.  Except if the split is arbitrary.

> I can go back and try to find whatever patch on the list that will explain
> this.  But it'll also go into git log.  Anyone reads this later will be
> confused once again.  Even harder for them to figure out what
> happened.

As said before, I completely disagree here.  And what is worse.  If it
gets wrong, with your approach git bisect will not help as much than
with my appreach.

> Do you think we could reorganize the patches so each of a single patch
> explains itself?

No.  See before.  We go for a very spaguetti code to a much less
spaguety code.

> The other thing is about priority of patches - I still have ~80 patches
> pending reviews on migration only.. Would you think it makes sense we pickg
> up important ones first and merge them with higher priority?

Ok, lets make this clear.
This whole atomic migration counters started because the zero_page
detection in multifd had the counters so wrong that meassuring speed
become impossible.

I haven't yet send the multifd zero pages.  And why was it so
complicated.  Just on top of my memory.

- how much data had we transferred.  Historically we stored that
  information on qemu-file.  But qemu-file can only be read/written from
  the migration thread.  So we went through jumps to be able to update
  that values.

  Current upstream code for compressed multifd assumes that it transfer
  as much data as non compressed one.  Why?  because we don't have an
  easy way to get that value back.  Contorsions that we were trying to
  do:

  https://lore.kernel.org/all/20220802063907.18882-5-quint...@redhat.com/

  To resume, the way that we had to do it was something like:

  - we send a bunch of pages to multifd thread
  - multifd thread send data and returns on the buffer what has written
  - migration thread when reuses a buffer adds the written stuff from
previous time than the struct was used.

  This was not just problematic from multifd zero pages detection.
  * compression was lying about it
  * zero_copy is doing it wrong (accounting at the time that it does the
write, not when it knows that it was written).

- rdma: this is even funnier
  * It accounted for 

Re: [PATCH v2 06/20] qemu_file: total_transferred is not used anymore

2023-06-14 Thread Peter Xu
On Tue, May 30, 2023 at 08:39:27PM +0200, Juan Quintela wrote:
> Signed-off-by: Juan Quintela 
> ---
>  migration/qemu-file.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index eb0497e532..6b6deea19b 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -41,9 +41,6 @@ struct QEMUFile {
>  QIOChannel *ioc;
>  bool is_writable;
>  
> -/* The sum of bytes transferred on the wire */
> -uint64_t total_transferred;
> -
>  int buf_index;
>  int buf_size; /* 0 when writing */
>  uint8_t buf[IO_BUF_SIZE];
> @@ -287,7 +284,6 @@ void qemu_fflush(QEMUFile *f)
>  qemu_file_set_error_obj(f, -EIO, local_error);
>  } else {
>  uint64_t size = iov_size(f->iov, f->iovcnt);
> -f->total_transferred += size;

I think this patch is another example why I think sometimes the way patch
is split are pretty much adding more complexity on review...

Here we removed a variable operation but it seems all fine if it's not used
anywhere.  But it also means current code base (before this patch applied)
doesn't make sense already because it contains this useless addition.  So
IMHO it means some previous patch does it just wrong.

I think it means it's caused by a wrong split of patches, then each patch
stops to make much sense as a standalone one.

I can go back and try to find whatever patch on the list that will explain
this.  But it'll also go into git log.  Anyone reads this later will be
confused once again.  Even harder for them to figure out what happened.

Do you think we could reorganize the patches so each of a single patch
explains itself?

The other thing is about priority of patches - I still have ~80 patches
pending reviews on migration only.. Would you think it makes sense we pick
up important ones first and merge them with higher priority?

What I have in mind are:

  - The regression you mentioned on qemu_fflush() when ram save (I hope I
understand the problem right, though...).

  - The slowness of migration-test.  I'm not sure whether you have anything
better than either Dan's approach or the switchover-hold flags, I just
think that seems more important to resolve for us upstream.  We can
merge Dan's or mine, you can also propose something else, but IMHO that
seems to be a higher priority?

And whatever else I haven't noticed.  I'll continue reading but I'm sure
you know the best on this..  so I'd really rely on you.

What do you think?

Thanks,

-- 
Peter Xu