Re: [pve-devel] [RFC PATCH manager] pvestatd: fix status server udp overflow

2020-03-04 Thread Thomas Lamprecht
On 3/4/20 10:01 AM, Dominik Csapak wrote:
> On 3/4/20 9:46 AM, Thomas Lamprecht wrote:
>> On 2/27/20 4:14 PM, Dominik Csapak wrote:
>>> for big vms (many disks/nics) a single status update can be bigger
>>> than 20k (i measured about 21k for a vm with 31 disks and nics)
>>> which means that a single update can bring the length of
>>> $txn->{data} over 65k (if it was >45k before)
>>
>> nice catch
>>
>>>
>>> we could reduce the limit when we send, but this would only move
>>> the problem, and we might run into it later on again
>>>
>>> to solve it (mostly) completely, append the data to $txn->{olddata}
>>> as long as the new and old data together do not get over 65k
>>>
>>> in that case, flush the olddata instead
>>>
>>> on finish flush olddata and if somehow data was left in data, flush
>>> that also
>>>
>>> one problem remains: if a single update from a vm/node/etc.. is over
>>> 65k, but this would require a bigger refactoring of the status
>>> plugins...
>>
>> hmm, your changes seems wrong, or better said unnecessarily complicated
>> while not solving all issues..
>>
>> I'll simply loop over data in 64k chunks if it's bigger than that and
>> send those out, simple package segmentation..
>>
> 
> but i guess that won't work properly, since e.g. the influxdb
> protocol is line-based and i guess it does not concatenate
> multiple udp packets... so if a line gets cut right in the middle
> the first part puts potentially wrong data into the db
> and the second is not the right syntax...
> 

do not just guess around.. UDP segmentation is something programs must
cope with, something in the network stack could even to this, else they
should use TCP..
And if this really is a problem, then:
1. curse at those *@?$! time series devs
2. still batch send but let the segmentation handle the plugin,
   so it can split one newline or whatever...

But i would take a solution that works, not a hack which solves some
parts only to redo it all in a few months anyway

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC PATCH manager] pvestatd: fix status server udp overflow

2020-03-04 Thread Dominik Csapak

On 3/4/20 9:46 AM, Thomas Lamprecht wrote:

On 2/27/20 4:14 PM, Dominik Csapak wrote:

for big vms (many disks/nics) a single status update can be bigger
than 20k (i measured about 21k for a vm with 31 disks and nics)
which means that a single update can bring the length of
$txn->{data} over 65k (if it was >45k before)


nice catch



we could reduce the limit when we send, but this would only move
the problem, and we might run into it later on again

to solve it (mostly) completely, append the data to $txn->{olddata}
as long as the new and old data together do not get over 65k

in that case, flush the olddata instead

on finish flush olddata and if somehow data was left in data, flush
that also

one problem remains: if a single update from a vm/node/etc.. is over
65k, but this would require a bigger refactoring of the status
plugins...


hmm, your changes seems wrong, or better said unnecessarily complicated
while not solving all issues..

I'll simply loop over data in 64k chunks if it's bigger than that and
send those out, simple package segmentation..



but i guess that won't work properly, since e.g. the influxdb
protocol is line-based and i guess it does not concatenate
multiple udp packets... so if a line gets cut right in the middle
the first part puts potentially wrong data into the db
and the second is not the right syntax...



Signed-off-by: Dominik Csapak 
---
  PVE/ExtMetric.pm | 32 ++--
  1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
index 14d98317..2f52b883 100644
--- a/PVE/ExtMetric.pm
+++ b/PVE/ExtMetric.pm
@@ -34,10 +34,16 @@ sub update_all($$@) {
  
  	$plugin->$method($txn, @params);
  
-	if (length($txn->{data}) > 48000) {

+   $txn->{olddata} //= '';
+   $txn->{data} //= '';
+
+   if ((length($txn->{olddata}) + length($txn->{data})) > 65000) {
# UDP stack cannot handle messages > 65k, if we've alot of data we
# do smaller batch sends then, but keep the connection alive
-   transaction_flush($txn, 1);
+   transaction_flush($txn, 1, 1);
+   $txn->{olddata} = delete $txn->{data};
+   } else {
+   $txn->{olddata} .= delete $txn->{data};
}
  }
  }
@@ -70,17 +76,27 @@ sub transactions_start {
  }
  
  sub transaction_flush {

-my ($txn, $keepconnected) = @_;
+my ($txn, $keepconnected, $useolddata) = @_;
  
  if (!$txn->{connection}) {

return if !$txn->{data}; # OK, if data was already sent/flushed
die "cannot flush metric data, no connection available!\n";
  }
-return if !defined($txn->{data}) || $txn->{data} eq '';
+if ($useolddata) {
+   return if !defined($txn->{olddata}) || $txn->{olddata} eq '';
+} else {
+   return if !defined($txn->{data}) || $txn->{data} eq '';
+}
  
  my $plugin = PVE::Status::Plugin->lookup($txn->{cfg}->{type});
  
-my $data = delete $txn->{data};

+my $data;
+if ($useolddata) {
+   $data = delete $txn->{olddata};
+} else {
+   $data = delete $txn->{data};
+}
+
  eval { $plugin->send($txn->{connection}, $data) };
  my $senderr = $@;
  
@@ -97,7 +113,11 @@ sub transactions_finish {

  my ($transactions) = @_;
  
  for my $txn (@$transactions) {

-   eval { transaction_flush($txn) };
+   eval { transaction_flush($txn, undef, 1) };
+   if ($txn->{data}) {
+   # should not happen, just for safety
+   eval { transaction_flush($txn) };
+   }
warn "$@" if $@;
  }
  }






___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC PATCH manager] pvestatd: fix status server udp overflow

2020-03-04 Thread Thomas Lamprecht
On 2/27/20 4:14 PM, Dominik Csapak wrote:
> for big vms (many disks/nics) a single status update can be bigger
> than 20k (i measured about 21k for a vm with 31 disks and nics)
> which means that a single update can bring the length of
> $txn->{data} over 65k (if it was >45k before)

nice catch

> 
> we could reduce the limit when we send, but this would only move
> the problem, and we might run into it later on again
> 
> to solve it (mostly) completely, append the data to $txn->{olddata}
> as long as the new and old data together do not get over 65k
> 
> in that case, flush the olddata instead
> 
> on finish flush olddata and if somehow data was left in data, flush
> that also
> 
> one problem remains: if a single update from a vm/node/etc.. is over
> 65k, but this would require a bigger refactoring of the status
> plugins...

hmm, your changes seems wrong, or better said unnecessarily complicated
while not solving all issues..

I'll simply loop over data in 64k chunks if it's bigger than that and
send those out, simple package segmentation..

> 
> Signed-off-by: Dominik Csapak 
> ---
>  PVE/ExtMetric.pm | 32 ++--
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
> index 14d98317..2f52b883 100644
> --- a/PVE/ExtMetric.pm
> +++ b/PVE/ExtMetric.pm
> @@ -34,10 +34,16 @@ sub update_all($$@) {
>  
>   $plugin->$method($txn, @params);
>  
> - if (length($txn->{data}) > 48000) {
> + $txn->{olddata} //= '';
> + $txn->{data} //= '';
> +
> + if ((length($txn->{olddata}) + length($txn->{data})) > 65000) {
>   # UDP stack cannot handle messages > 65k, if we've alot of data we
>   # do smaller batch sends then, but keep the connection alive
> - transaction_flush($txn, 1);
> + transaction_flush($txn, 1, 1);
> + $txn->{olddata} = delete $txn->{data};
> + } else {
> + $txn->{olddata} .= delete $txn->{data};
>   }
>  }
>  }
> @@ -70,17 +76,27 @@ sub transactions_start {
>  }
>  
>  sub transaction_flush {
> -my ($txn, $keepconnected) = @_;
> +my ($txn, $keepconnected, $useolddata) = @_;
>  
>  if (!$txn->{connection}) {
>   return if !$txn->{data}; # OK, if data was already sent/flushed
>   die "cannot flush metric data, no connection available!\n";
>  }
> -return if !defined($txn->{data}) || $txn->{data} eq '';
> +if ($useolddata) {
> + return if !defined($txn->{olddata}) || $txn->{olddata} eq '';
> +} else {
> + return if !defined($txn->{data}) || $txn->{data} eq '';
> +}
>  
>  my $plugin = PVE::Status::Plugin->lookup($txn->{cfg}->{type});
>  
> -my $data = delete $txn->{data};
> +my $data;
> +if ($useolddata) {
> + $data = delete $txn->{olddata};
> +} else {
> + $data = delete $txn->{data};
> +}
> +
>  eval { $plugin->send($txn->{connection}, $data) };
>  my $senderr = $@;
>  
> @@ -97,7 +113,11 @@ sub transactions_finish {
>  my ($transactions) = @_;
>  
>  for my $txn (@$transactions) {
> - eval { transaction_flush($txn) };
> + eval { transaction_flush($txn, undef, 1) };
> + if ($txn->{data}) {
> + # should not happen, just for safety
> + eval { transaction_flush($txn) };
> + }
>   warn "$@" if $@;
>  }
>  }
> 


___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [RFC PATCH manager] pvestatd: fix status server udp overflow

2020-02-27 Thread Dominik Csapak
for big vms (many disks/nics) a single status update can be bigger
than 20k (i measured about 21k for a vm with 31 disks and nics)
which means that a single update can bring the length of
$txn->{data} over 65k (if it was >45k before)

we could reduce the limit when we send, but this would only move
the problem, and we might run into it later on again

to solve it (mostly) completely, append the data to $txn->{olddata}
as long as the new and old data together do not get over 65k

in that case, flush the olddata instead

on finish flush olddata and if somehow data was left in data, flush
that also

one problem remains: if a single update from a vm/node/etc.. is over
65k, but this would require a bigger refactoring of the status
plugins...

Signed-off-by: Dominik Csapak 
---
 PVE/ExtMetric.pm | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
index 14d98317..2f52b883 100644
--- a/PVE/ExtMetric.pm
+++ b/PVE/ExtMetric.pm
@@ -34,10 +34,16 @@ sub update_all($$@) {
 
$plugin->$method($txn, @params);
 
-   if (length($txn->{data}) > 48000) {
+   $txn->{olddata} //= '';
+   $txn->{data} //= '';
+
+   if ((length($txn->{olddata}) + length($txn->{data})) > 65000) {
# UDP stack cannot handle messages > 65k, if we've alot of data we
# do smaller batch sends then, but keep the connection alive
-   transaction_flush($txn, 1);
+   transaction_flush($txn, 1, 1);
+   $txn->{olddata} = delete $txn->{data};
+   } else {
+   $txn->{olddata} .= delete $txn->{data};
}
 }
 }
@@ -70,17 +76,27 @@ sub transactions_start {
 }
 
 sub transaction_flush {
-my ($txn, $keepconnected) = @_;
+my ($txn, $keepconnected, $useolddata) = @_;
 
 if (!$txn->{connection}) {
return if !$txn->{data}; # OK, if data was already sent/flushed
die "cannot flush metric data, no connection available!\n";
 }
-return if !defined($txn->{data}) || $txn->{data} eq '';
+if ($useolddata) {
+   return if !defined($txn->{olddata}) || $txn->{olddata} eq '';
+} else {
+   return if !defined($txn->{data}) || $txn->{data} eq '';
+}
 
 my $plugin = PVE::Status::Plugin->lookup($txn->{cfg}->{type});
 
-my $data = delete $txn->{data};
+my $data;
+if ($useolddata) {
+   $data = delete $txn->{olddata};
+} else {
+   $data = delete $txn->{data};
+}
+
 eval { $plugin->send($txn->{connection}, $data) };
 my $senderr = $@;
 
@@ -97,7 +113,11 @@ sub transactions_finish {
 my ($transactions) = @_;
 
 for my $txn (@$transactions) {
-   eval { transaction_flush($txn) };
+   eval { transaction_flush($txn, undef, 1) };
+   if ($txn->{data}) {
+   # should not happen, just for safety
+   eval { transaction_flush($txn) };
+   }
warn "$@" if $@;
 }
 }
-- 
2.20.1


___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel