Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-09 Thread Thomas Lamprecht
On 1/9/19 11:18 AM, Alexandre DERUMIER wrote:
>>> I can trigger the bug with simply restart pve-cluster service.  (I think it 
>>> umount/mount /etc/pve).
>>> That's simply break my firewalled connections in my vms 
> 
> Could we add a simple check to see if /etc/pve is mounted ?

yes, sounds reasonable, as this happens on every pve-cluster update.

> 
> - Mail original -
> De: "aderumier" 
> À: "pve-devel" 
> Envoyé: Mercredi 9 Janvier 2019 11:06:26
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
> replicated and rules are updated ?
> 
>>> Booting node, pve-cluster fails to start/mount. But, in this case you're 
>>> done 
>>> anyway as either you: 
>>> * change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then 
>>> result 
>>> in no pve-firewall service running at all 
>>> * or keep it as is, which starts the pve-firewall service nonetheless, but 
>>> with 
>>> default off as the config cannot be read. 
> 
>>> The latter may even be slightly better as as soon as pve-cluster got 
>>> repaired 
>>> the firewall service starts to work correctly automatically... 
> 
> I can trigger the bug with simply restart pve-cluster service. (I think it 
> umount/mount /etc/pve). 
> That's simply break my firewalled connections in my vms 
> 
> 
> 
> 
> 
> - Mail original - 
> De: "Thomas Lamprecht"  
> À: "pve-devel" , "aderumier"  
> Envoyé: Mercredi 9 Janvier 2019 09:49:44 
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
> replicated and rules are updated ? 
> 
> On 1/9/19 9:17 AM, Thomas Lamprecht wrote: 
>> On 1/9/19 8:36 AM, Alexandre DERUMIER wrote: 
> Hmm, but if one wants to restore the defaults by simply deleting the file 
> he'd 
> need to restart the firewall daemon too. Not really sure if this is ideal 
> either... Even if we could do heuristics for if the file was really 
> removed/truncated (double checks) that would be just feel hacky and as 
> said 
> above, such actions can get you in trouble with all processes where there 
> are 
> reader writers, so this should be handled by the one updating the file. 
>>>
>>> Ok I understand. 
>>> I'm also think of case, where we could have a corosync/network failure, 
>>> where /etc/pve couldn't be mounted anymore or not readable, 
>>> that mean that in this case the firewall will be off too. 
>>> That's seem bad for security 
>>
>> Yeah, that's a valid concern. 
> 
> Argh, wrong. If there's no quorum network failure you still have the old 
> state 
> in the cluster.conf represented, the filesystem just went into read only mode 
> so modifications aren't possible anyway. 
> 
>> Maybe we could simply omit changing rules or anything else if we are not 
>> quorate? 
>> Would seem like the right thing to do, because in that case we cannot assume 
>> anything so it's best to keep the last valid state intact. 
>>
> 
> So this is already the behaviour there. And pve-firewall depends on 
> pve-cluster, 
> so it should only start after it mounted. The single issue could be: 
> 
> Booting node, pve-cluster fails to start/mount. But, in this case you're done 
> anyway as either you: 
> * change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then 
> result 
> in no pve-firewall service running at all 
> * or keep it as is, which starts the pve-firewall service nonetheless, but 
> with 
> default off as the config cannot be read. 
> 
> The latter may even be slightly better as as soon as pve-cluster got repaired 
> the firewall service starts to work correctly automatically... 
> 
> ___ 
> pve-devel mailing list 
> pve-devel@pve.proxmox.com 
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> ___
> pve-devel mailing list
> pve-devel@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 



___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-09 Thread Alexandre DERUMIER
>>I can trigger the bug with simply restart pve-cluster service.  (I think it 
>>umount/mount /etc/pve).
>>That's simply break my firewalled connections in my vms 

Could we add a simple check to see if /etc/pve is mounted ?

- Mail original -
De: "aderumier" 
À: "pve-devel" 
Envoyé: Mercredi 9 Janvier 2019 11:06:26
Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
replicated and rules are updated ?

>>Booting node, pve-cluster fails to start/mount. But, in this case you're done 
>>anyway as either you: 
>>* change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then 
>>result 
>> in no pve-firewall service running at all 
>>* or keep it as is, which starts the pve-firewall service nonetheless, but 
>>with 
>> default off as the config cannot be read. 

>>The latter may even be slightly better as as soon as pve-cluster got repaired 
>>the firewall service starts to work correctly automatically... 

I can trigger the bug with simply restart pve-cluster service. (I think it 
umount/mount /etc/pve). 
That's simply break my firewalled connections in my vms 





- Mail original - 
De: "Thomas Lamprecht"  
À: "pve-devel" , "aderumier"  
Envoyé: Mercredi 9 Janvier 2019 09:49:44 
Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
replicated and rules are updated ? 

On 1/9/19 9:17 AM, Thomas Lamprecht wrote: 
> On 1/9/19 8:36 AM, Alexandre DERUMIER wrote: 
 Hmm, but if one wants to restore the defaults by simply deleting the file 
 he'd 
 need to restart the firewall daemon too. Not really sure if this is ideal 
 either... Even if we could do heuristics for if the file was really 
 removed/truncated (double checks) that would be just feel hacky and as 
 said 
 above, such actions can get you in trouble with all processes where there 
 are 
 reader writers, so this should be handled by the one updating the file. 
>> 
>> Ok I understand. 
>> I'm also think of case, where we could have a corosync/network failure, 
>> where /etc/pve couldn't be mounted anymore or not readable, 
>> that mean that in this case the firewall will be off too. 
>> That's seem bad for security 
> 
> Yeah, that's a valid concern. 

Argh, wrong. If there's no quorum network failure you still have the old state 
in the cluster.conf represented, the filesystem just went into read only mode 
so modifications aren't possible anyway. 

> Maybe we could simply omit changing rules or anything else if we are not 
> quorate? 
> Would seem like the right thing to do, because in that case we cannot assume 
> anything so it's best to keep the last valid state intact. 
> 

So this is already the behaviour there. And pve-firewall depends on 
pve-cluster, 
so it should only start after it mounted. The single issue could be: 

Booting node, pve-cluster fails to start/mount. But, in this case you're done 
anyway as either you: 
* change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then result 
in no pve-firewall service running at all 
* or keep it as is, which starts the pve-firewall service nonetheless, but with 
default off as the config cannot be read. 

The latter may even be slightly better as as soon as pve-cluster got repaired 
the firewall service starts to work correctly automatically... 

___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-09 Thread Alexandre DERUMIER
>>Booting node, pve-cluster fails to start/mount. But, in this case you're done
>>anyway as either you:
>>* change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then result
>>  in no pve-firewall service running at all
>>* or keep it as is, which starts the pve-firewall service nonetheless, but 
>>with
>>  default off as the config cannot be read.

>>The latter may even be slightly better as as soon as pve-cluster got repaired
>>the firewall service starts to work correctly automatically...

I can trigger the bug with simply restart pve-cluster service.  (I think it 
umount/mount /etc/pve).
That's simply break my firewalled connections in my vms 





- Mail original -
De: "Thomas Lamprecht" 
À: "pve-devel" , "aderumier" 
Envoyé: Mercredi 9 Janvier 2019 09:49:44
Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
replicated and rules are updated ?

On 1/9/19 9:17 AM, Thomas Lamprecht wrote: 
> On 1/9/19 8:36 AM, Alexandre DERUMIER wrote: 
 Hmm, but if one wants to restore the defaults by simply deleting the file 
 he'd 
 need to restart the firewall daemon too. Not really sure if this is ideal 
 either... Even if we could do heuristics for if the file was really 
 removed/truncated (double checks) that would be just feel hacky and as 
 said 
 above, such actions can get you in trouble with all processes where there 
 are 
 reader writers, so this should be handled by the one updating the file. 
>> 
>> Ok I understand. 
>> I'm also think of case, where we could have a corosync/network failure, 
>> where /etc/pve couldn't be mounted anymore or not readable, 
>> that mean that in this case the firewall will be off too. 
>> That's seem bad for security 
> 
> Yeah, that's a valid concern. 

Argh, wrong. If there's no quorum network failure you still have the old state 
in the cluster.conf represented, the filesystem just went into read only mode 
so modifications aren't possible anyway. 

> Maybe we could simply omit changing rules or anything else if we are not 
> quorate? 
> Would seem like the right thing to do, because in that case we cannot assume 
> anything so it's best to keep the last valid state intact. 
> 

So this is already the behaviour there. And pve-firewall depends on 
pve-cluster, 
so it should only start after it mounted. The single issue could be: 

Booting node, pve-cluster fails to start/mount. But, in this case you're done 
anyway as either you: 
* change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then result 
in no pve-firewall service running at all 
* or keep it as is, which starts the pve-firewall service nonetheless, but with 
default off as the config cannot be read. 

The latter may even be slightly better as as soon as pve-cluster got repaired 
the firewall service starts to work correctly automatically... 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-09 Thread Thomas Lamprecht
On 1/9/19 9:17 AM, Thomas Lamprecht wrote:
> On 1/9/19 8:36 AM, Alexandre DERUMIER wrote:
 Hmm, but if one wants to restore the defaults by simply deleting the file 
 he'd 
 need to restart the firewall daemon too. Not really sure if this is ideal 
 either... Even if we could do heuristics for if the file was really 
 removed/truncated (double checks) that would be just feel hacky and as 
 said 
 above, such actions can get you in trouble with all processes where there 
 are 
 reader writers, so this should be handled by the one updating the file. 
>>
>> Ok I understand.
>> I'm also think of case, where we could have a corosync/network failure, 
>> where /etc/pve couldn't be mounted anymore or not readable, 
>> that mean that in this case the firewall will be off too.
>> That's seem bad for security
> 
> Yeah, that's a valid concern.

Argh, wrong. If there's no quorum network failure you still have the old state
in the cluster.conf represented, the filesystem just went into read only mode
so modifications aren't possible anyway.

> Maybe we could simply omit changing rules or anything else if we are not 
> quorate?
> Would seem like the right thing to do, because in that case we cannot assume
> anything so it's best to keep the last valid state intact.
> 

So this is already the behaviour there. And pve-firewall depends on pve-cluster,
so it should only start after it mounted. The single issue could be:

Booting node, pve-cluster fails to start/mount. But, in this case you're done
anyway as either you:
* change the "Wants=pve-cluster" to "Requires=üpve-cluster", which then result
  in no pve-firewall service running at all
* or keep it as is, which starts the pve-firewall service nonetheless, but with
  default off as the config cannot be read.

The latter may even be slightly better as as soon as pve-cluster got repaired
the firewall service starts to work correctly automatically...


___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-09 Thread Thomas Lamprecht
On 1/9/19 8:36 AM, Alexandre DERUMIER wrote:
>>> Hmm, but if one wants to restore the defaults by simply deleting the file 
>>> he'd 
>>> need to restart the firewall daemon too. Not really sure if this is ideal 
>>> either... Even if we could do heuristics for if the file was really 
>>> removed/truncated (double checks) that would be just feel hacky and as said 
>>> above, such actions can get you in trouble with all processes where there 
>>> are 
>>> reader writers, so this should be handled by the one updating the file. 
> 
> Ok I understand.
> I'm also think of case, where we could have a corosync/network failure, 
> where /etc/pve couldn't be mounted anymore or not readable, 
> that mean that in this case the firewall will be off too.
> That's seem bad for security

Yeah, that's a valid concern.
Maybe we could simply omit changing rules or anything else if we are not 
quorate?
Would seem like the right thing to do, because in that case we cannot assume
anything so it's best to keep the last valid state intact.

> 
>>> But maybe a note like "As with all other filesystems you need to ensure a 
>>> write 
>>> operation is seen atomic by any read process by writing to a temporary file 
>>> and 
>>> then renaming (move) it. 
> 
> Sound great :)

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Alexandre DERUMIER
>>But that is true for file systems in general? Even if you're on a local
>>"standard" filesystem and have a reader/writer process pair you need to ensure
>>some level of atomicity regulation, either a (shared) rw_lock or like in our
>>case, where the reader reopens the file on every read loop iteration anyway a
>>"write to tmp then rename" is enough, as with this the reader never opens the
>>file while write operations are still in flight. So nothing specific to 
>>pmxcfs.

Thanks Thomas. I really didn't known about this. (Or didn't remember)

I found a small vim script to do atomic write (.tmp + mv)
https://github.com/vim-scripts/Atomic-Save


>>Hmm, but if one wants to restore the defaults by simply deleting the file 
>>he'd 
>>need to restart the firewall daemon too. Not really sure if this is ideal 
>>either... Even if we could do heuristics for if the file was really 
>>removed/truncated (double checks) that would be just feel hacky and as said 
>>above, such actions can get you in trouble with all processes where there are 
>>reader writers, so this should be handled by the one updating the file. 

Ok I understand.
I'm also think of case, where we could have a corosync/network failure, 
where /etc/pve couldn't be mounted anymore or not readable, 
that mean that in this case the firewall will be off too.
That's seem bad for security



>>But maybe a note like "As with all other filesystems you need to ensure a 
>>write 
>>operation is seen atomic by any read process by writing to a temporary file 
>>and 
>>then renaming (move) it. 

Sound great :)


- Mail original -
De: "Thomas Lamprecht" 
À: "pve-devel" , "aderumier" , 
"Stefan Priebe, Profihost AG" 
Envoyé: Mercredi 9 Janvier 2019 08:16:46
Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
replicated and rules are updated ?

On 1/8/19 10:19 PM, Alexandre DERUMIER wrote: 
>>> or those cases i use something like (pseudocode - i use salt not puppet): 
>>> 
>>> - manage copy of file 
>>> - if file has changed trigger: 
>>> - mv -v $managedfile $realfile 
>>> 
> 
>>> Greets, 
>>> Stefan 
> 
> Thanks Stefan, works fine indeed. 
> 
> I really didn't known/remember that /etc/pve was not atomic without a move. 
> Maybe should we add a warning in documentation about this. 

But that is true for file systems in general? Even if you're on a local 
"standard" filesystem and have a reader/writer process pair you need to ensure 
some level of atomicity regulation, either a (shared) rw_lock or like in our 
case, where the reader reopens the file on every read loop iteration anyway a 
"write to tmp then rename" is enough, as with this the reader never opens the 
file while write operations are still in flight. So nothing specific to pmxcfs. 

> 
> also, maybe for firewall, could we add a protection for cluster.fw ( 

Hmm, but if one wants to restore the defaults by simply deleting the file he'd 
need to restart the firewall daemon too. Not really sure if this is ideal 
either... Even if we could do heuristics for if the file was really 
removed/truncated (double checks) that would be just feel hacky and as said 
above, such actions can get you in trouble with all processes where there are 
reader writers, so this should be handled by the one updating the file. 

But maybe a note like "As with all other filesystems you need to ensure a write 
operation is seen atomic by any read process by writing to a temporary file and 
then renaming (move) it. 


> 
> sub update { 
> my $code = sub { 
> 
> my $cluster_conf = load_clusterfw_conf(); 
> + return if !$cluster_conf 
> 
> my $cluster_options = $cluster_conf->{options}; 
> 
> if (!$cluster_options->{enable}) { 
> PVE::Firewall::remove_pvefw_chains(); 
> return; 
> } 
> 
> 
> - Mail original - 
> De: "Stefan Priebe, Profihost AG"  
> À: "pve-devel" , "aderumier" 
> , "Thomas Lamprecht"  
> Envoyé: Mardi 8 Janvier 2019 21:59:44 
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
> replicated and rules are updated ? 
> 
> Hi Alexandre, 
> 
> 
> Am 08.01.19 um 21:55 schrieb Alexandre DERUMIER: 
 But, file_set_contents - which save_clusterfw_conf uses - does this 
 already[0], 
 so maybe this is the "high-level fuse rename isn't atomic" bug again... 
 May need to take a closer look tomorrow. 
>> 
>> mmm, ok. 
>> 
>> In my case, it was with a simple file copy (cp /tmp/cluster.fw 
>> /etc/pve/firewall/cluster.fw). (I manage cluster.fw through puppet for 
>> multiple cluster). 
>> can reproduce it too with a simple vi edition. 
>> 
>> I think others users could trigger this too, if they use scripts to automate 
>> ipset (blacklist, ). 
>> 
>> Maybe could it be better to only disable firewall when option is setup with 
>> "enabled:0", and if cluster.fw is missing, don't change any rules. 
>> ²²² 
> 
> 
> for those cases i use something like (pseudocode - i use salt not puppet): 
> 
> - manage copy of file 
> - if file has changed trigger: 
> - 

Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Thomas Lamprecht
On 1/8/19 10:19 PM, Alexandre DERUMIER wrote:
>>> or those cases i use something like (pseudocode - i use salt not puppet):
>>>
>>> - manage copy of file
>>> - if file has changed trigger:
>>>   - mv -v $managedfile $realfile
>>>
> 
>>> Greets,
>>> Stefan
> 
> Thanks Stefan, works fine indeed.
> 
> I really didn't known/remember that /etc/pve was not atomic without a move.
> Maybe should we add a warning in documentation about this.

But that is true for file systems in general? Even if you're on a local
"standard" filesystem and have a reader/writer process pair you need to ensure
some level of atomicity regulation, either a (shared) rw_lock or like in our
case, where the reader reopens the file on every read loop iteration anyway a
"write to tmp then rename" is enough, as with this the reader never opens the
file while write operations are still in flight. So nothing specific to pmxcfs.

> 
> also, maybe for firewall, could we add a protection for cluster.fw (

Hmm, but if one wants to restore the defaults by simply deleting the file he'd
need to restart the firewall daemon too. Not really sure if this is ideal
either... Even if we could do heuristics for if the file was really
removed/truncated (double checks) that would be just feel hacky and as said
above, such actions can get you in trouble with all processes where there are
reader writers, so this should be handled by the one updating the file.

But maybe a note like "As with all other filesystems you need to ensure a write
operation is seen atomic by any read process by writing to a temporary file and
then renaming (move) it.


> 
> sub update {
> my $code = sub {
> 
> my $cluster_conf = load_clusterfw_conf();
> +   return if !$cluster_conf
> 
> my $cluster_options = $cluster_conf->{options};
> 
> if (!$cluster_options->{enable}) {
> PVE::Firewall::remove_pvefw_chains();
> return;
> }
> 
> 
> - Mail original -
> De: "Stefan Priebe, Profihost AG" 
> À: "pve-devel" , "aderumier" 
> , "Thomas Lamprecht" 
> Envoyé: Mardi 8 Janvier 2019 21:59:44
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
> replicated and rules are updated ?
> 
> Hi Alexandre, 
> 
> 
> Am 08.01.19 um 21:55 schrieb Alexandre DERUMIER: 
 But, file_set_contents - which save_clusterfw_conf uses - does this 
 already[0], 
 so maybe this is the "high-level fuse rename isn't atomic" bug again... 
 May need to take a closer look tomorrow. 
>>
>> mmm, ok. 
>>
>> In my case, it was with a simple file copy (cp /tmp/cluster.fw 
>> /etc/pve/firewall/cluster.fw). (I manage cluster.fw through puppet for 
>> multiple cluster). 
>> can reproduce it too with a simple vi edition. 
>>
>> I think others users could trigger this too, if they use scripts to automate 
>> ipset (blacklist, ). 
>>
>> Maybe could it be better to only disable firewall when option is setup with 
>> "enabled:0", and if cluster.fw is missing, don't change any rules. 
>> ²²² 
> 
> 
> for those cases i use something like (pseudocode - i use salt not puppet): 
> 
> - manage copy of file 
> - if file has changed trigger: 
> - mv -v $managedfile $realfile 
> 
> 
> Greets, 
> Stefan 
> 
>>
>>
>>
>> - Mail original - 
>> De: "Thomas Lamprecht"  
>> À: "pve-devel" , "aderumier" 
>>  
>> Envoyé: Mardi 8 Janvier 2019 20:58:51 
>> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
>> replicated and rules are updated ? 
>>
>> Hi, 
>>
>> On 1/8/19 7:37 PM, Alexandre DERUMIER wrote: 
>>> I'm able to reproduce with: 
>>> --- 
>>> on 1 host: 
>>>
>>> cluster.fw: 
>>> [OPTIONS] 
>>>
>>> enable: 1 
>>> policy_in: ACCEPT 
>>>
>>>
>>>
>>>
>>> #!/usr/bin/perl 
>>>
>>> use IO::File; 
>>> use PVE::Firewall; 
>>> use Data::Dumper; 
>>> use Time::HiRes qw ( time alarm sleep usleep ); 
>>>
>>> while(1){ 
>>>
>>> $filename = "/etc/pve/firewall/cluster.fw"; 
>>>
>>> if (my $fh = IO::File->new($filename, O_RDONLY)) { 
>>>
>>> $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, 
>>> $verbose); 
>>> my $cluster_options = $cluster_conf->{options}; 
>>>
>>> if (!$cluster_options->{enable}) { 
>>> print Dumper($cluster_options); 
>>> die "error\n"; 
>>> } 
>>>
>>> } 
>>> usleep(100); 
>>> }; 
>>>
>>>
>>> the script is running fine. 
>>>
>>>
>>> on another host, edit the file (simple open/write), 
>>> then the script on first host, return 
>>>
>>> $VAR1 = {}; 
>>> error 
>>
>> that is expected, AFAICT, a modify operation shouldn't be: 
>> * read FILE -> modify -> write FILE 
>> but rather: 
>> * read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE 
>> if it's wanted that always a valid content is read. Else yes, you may have a 
>> small 
>> time window where the file is truncated. 
>>
>> But, file_set_contents - which save_clusterfw_conf uses - does this 
>> already[0], 
>> so maybe this is the "high-level fuse rename isn't atomic" bug again... 
>> May need to take a 

Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Alexandre DERUMIER
>>or those cases i use something like (pseudocode - i use salt not puppet):
>>
>>- manage copy of file
>>- if file has changed trigger:
>>   - mv -v $managedfile $realfile
>>

>>Greets,
>>Stefan

Thanks Stefan, works fine indeed.

I really didn't known/remember that /etc/pve was not atomic without a move.
Maybe should we add a warning in documentation about this.


also, maybe for firewall, could we add a protection for cluster.fw (

sub update {
my $code = sub {

my $cluster_conf = load_clusterfw_conf();
+   return if !$cluster_conf

my $cluster_options = $cluster_conf->{options};

if (!$cluster_options->{enable}) {
PVE::Firewall::remove_pvefw_chains();
return;
}


- Mail original -
De: "Stefan Priebe, Profihost AG" 
À: "pve-devel" , "aderumier" , 
"Thomas Lamprecht" 
Envoyé: Mardi 8 Janvier 2019 21:59:44
Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
replicated and rules are updated ?

Hi Alexandre, 


Am 08.01.19 um 21:55 schrieb Alexandre DERUMIER: 
>>> But, file_set_contents - which save_clusterfw_conf uses - does this 
>>> already[0], 
>>> so maybe this is the "high-level fuse rename isn't atomic" bug again... 
>>> May need to take a closer look tomorrow. 
> 
> mmm, ok. 
> 
> In my case, it was with a simple file copy (cp /tmp/cluster.fw 
> /etc/pve/firewall/cluster.fw). (I manage cluster.fw through puppet for 
> multiple cluster). 
> can reproduce it too with a simple vi edition. 
> 
> I think others users could trigger this too, if they use scripts to automate 
> ipset (blacklist, ). 
> 
> Maybe could it be better to only disable firewall when option is setup with 
> "enabled:0", and if cluster.fw is missing, don't change any rules. 
> ²²² 


for those cases i use something like (pseudocode - i use salt not puppet): 

- manage copy of file 
- if file has changed trigger: 
- mv -v $managedfile $realfile 


Greets, 
Stefan 

> 
> 
> 
> - Mail original - 
> De: "Thomas Lamprecht"  
> À: "pve-devel" , "aderumier"  
> Envoyé: Mardi 8 Janvier 2019 20:58:51 
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
> replicated and rules are updated ? 
> 
> Hi, 
> 
> On 1/8/19 7:37 PM, Alexandre DERUMIER wrote: 
>> I'm able to reproduce with: 
>> --- 
>> on 1 host: 
>> 
>> cluster.fw: 
>> [OPTIONS] 
>> 
>> enable: 1 
>> policy_in: ACCEPT 
>> 
>> 
>> 
>> 
>> #!/usr/bin/perl 
>> 
>> use IO::File; 
>> use PVE::Firewall; 
>> use Data::Dumper; 
>> use Time::HiRes qw ( time alarm sleep usleep ); 
>> 
>> while(1){ 
>> 
>> $filename = "/etc/pve/firewall/cluster.fw"; 
>> 
>> if (my $fh = IO::File->new($filename, O_RDONLY)) { 
>> 
>> $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, 
>> $verbose); 
>> my $cluster_options = $cluster_conf->{options}; 
>> 
>> if (!$cluster_options->{enable}) { 
>> print Dumper($cluster_options); 
>> die "error\n"; 
>> } 
>> 
>> } 
>> usleep(100); 
>> }; 
>> 
>> 
>> the script is running fine. 
>> 
>> 
>> on another host, edit the file (simple open/write), 
>> then the script on first host, return 
>> 
>> $VAR1 = {}; 
>> error 
> 
> that is expected, AFAICT, a modify operation shouldn't be: 
> * read FILE -> modify -> write FILE 
> but rather: 
> * read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE 
> if it's wanted that always a valid content is read. Else yes, you may have a 
> small 
> time window where the file is truncated. 
> 
> But, file_set_contents - which save_clusterfw_conf uses - does this 
> already[0], 
> so maybe this is the "high-level fuse rename isn't atomic" bug again... 
> May need to take a closer look tomorrow. 
> 
> [0]: 
> https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213
>  
> 
>> 
>> - Mail original - 
>> De: "aderumier"  
>> À: "pve-devel"  
>> Envoyé: Mardi 8 Janvier 2019 19:15:06 
>> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is 
>> replicated and rules are updated ? 
>> 
>> Hi, 
>> I'm currently debugging a possible firewalling problem. 
>> I'm running some cephfs client in vm, firewalled by proxmox. 
>> cephfs client are really sensitive to network problem, and mainly with 
>> packets logss or dropped packets. 
>> 
>> I'm really not sure, but I have currently puppet updating my cluster.fw, at 
>> regular interval, 
>> and sometimes, I have all the vm on a specific host (or multiple hosts), at 
>> the same time, have a small disconnect (maybe some second). 
>> 
>> 
>> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
>> or if they are any chance, that during file replication, the firewall try to 
>> read the file, 
>> it could be empty ? 
>> 
>> 
>> I just wonder (I'm really really not sure) if I could trigger this: 
>> 
>> 
>> sub update { 
>> my $code = sub { 
>> 
>> my $cluster_conf = load_clusterfw_conf(); 
>> my $cluster_options = 

Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Stefan Priebe - Profihost AG
Hi Alexandre,


Am 08.01.19 um 21:55 schrieb Alexandre DERUMIER:
>>> But, file_set_contents - which save_clusterfw_conf uses - does this 
>>> already[0],
>>> so maybe this is the "high-level fuse rename isn't atomic" bug again...
>>> May need to take a closer look tomorrow.
> 
> mmm, ok.
> 
> In my case, it was with a simple file copy (cp /tmp/cluster.fw 
> /etc/pve/firewall/cluster.fw). (I manage cluster.fw through puppet for 
> multiple cluster).
> can reproduce it too with a simple vi edition.
> 
> I think others users could trigger this too, if they use scripts to automate 
> ipset (blacklist, ).
> 
> Maybe could it be better to only disable firewall when option is setup with 
> "enabled:0", and if cluster.fw is missing, don't change any rules.
> ²²²


for those cases i use something like (pseudocode - i use salt not puppet):

- manage copy of file
- if file has changed trigger:
   - mv -v $managedfile $realfile


Greets,
Stefan

> 
> 
> 
> - Mail original -
> De: "Thomas Lamprecht" 
> À: "pve-devel" , "aderumier" 
> Envoyé: Mardi 8 Janvier 2019 20:58:51
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
> replicated and rules are updated ?
> 
> Hi, 
> 
> On 1/8/19 7:37 PM, Alexandre DERUMIER wrote: 
>> I'm able to reproduce with: 
>> --- 
>> on 1 host: 
>>
>> cluster.fw: 
>> [OPTIONS] 
>>
>> enable: 1 
>> policy_in: ACCEPT 
>>
>>
>>
>>
>> #!/usr/bin/perl 
>>
>> use IO::File; 
>> use PVE::Firewall; 
>> use Data::Dumper; 
>> use Time::HiRes qw ( time alarm sleep usleep ); 
>>
>> while(1){ 
>>
>> $filename = "/etc/pve/firewall/cluster.fw"; 
>>
>> if (my $fh = IO::File->new($filename, O_RDONLY)) { 
>>
>> $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, 
>> $verbose); 
>> my $cluster_options = $cluster_conf->{options}; 
>>
>> if (!$cluster_options->{enable}) { 
>> print Dumper($cluster_options); 
>> die "error\n"; 
>> } 
>>
>> } 
>> usleep(100); 
>> }; 
>>
>>
>> the script is running fine. 
>>
>>
>> on another host, edit the file (simple open/write), 
>> then the script on first host, return 
>>
>> $VAR1 = {}; 
>> error 
> 
> that is expected, AFAICT, a modify operation shouldn't be: 
> * read FILE -> modify -> write FILE 
> but rather: 
> * read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE 
> if it's wanted that always a valid content is read. Else yes, you may have a 
> small 
> time window where the file is truncated. 
> 
> But, file_set_contents - which save_clusterfw_conf uses - does this 
> already[0], 
> so maybe this is the "high-level fuse rename isn't atomic" bug again... 
> May need to take a closer look tomorrow. 
> 
> [0]: 
> https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213
>  
> 
>>
>> - Mail original - 
>> De: "aderumier"  
>> À: "pve-devel"  
>> Envoyé: Mardi 8 Janvier 2019 19:15:06 
>> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is 
>> replicated and rules are updated ? 
>>
>> Hi, 
>> I'm currently debugging a possible firewalling problem. 
>> I'm running some cephfs client in vm, firewalled by proxmox. 
>> cephfs client are really sensitive to network problem, and mainly with 
>> packets logss or dropped packets. 
>>
>> I'm really not sure, but I have currently puppet updating my cluster.fw, at 
>> regular interval, 
>> and sometimes, I have all the vm on a specific host (or multiple hosts), at 
>> the same time, have a small disconnect (maybe some second). 
>>
>>
>> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
>> or if they are any chance, that during file replication, the firewall try to 
>> read the file, 
>> it could be empty ? 
>>
>>
>> I just wonder (I'm really really not sure) if I could trigger this: 
>>
>>
>> sub update { 
>> my $code = sub { 
>>
>> my $cluster_conf = load_clusterfw_conf(); 
>> my $cluster_options = $cluster_conf->{options}; 
>>
>> if (!$cluster_options->{enable}) { 
>> PVE::Firewall::remove_pvefw_chains(); 
>> return; 
>> } 
>>
>>
>> cluster.conf not readable/absent/ , and remove_pvefw_chains called. 
>> then after some seconds, rules are applied again. 
>>
>>
>> I'm going to add some log to try to reproduce it. (BTW, it could be great to 
>> logs rules changed, maybe an audit log with a diff could be great) 
>> ___ 
>> pve-devel mailing list 
>> pve-devel@pve.proxmox.com 
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
>>
>> ___ 
>> pve-devel mailing list 
>> pve-devel@pve.proxmox.com 
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> ___
> pve-devel mailing list
> pve-devel@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
___
pve-devel mailing list
pve-devel@pve.proxmox.com

Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Alexandre DERUMIER
>>But, file_set_contents - which save_clusterfw_conf uses - does this 
>>already[0],
>>so maybe this is the "high-level fuse rename isn't atomic" bug again...
>>May need to take a closer look tomorrow.

mmm, ok.

In my case, it was with a simple file copy (cp /tmp/cluster.fw 
/etc/pve/firewall/cluster.fw). (I manage cluster.fw through puppet for multiple 
cluster).
can reproduce it too with a simple vi edition.

I think others users could trigger this too, if they use scripts to automate 
ipset (blacklist, ).

Maybe could it be better to only disable firewall when option is setup with 
"enabled:0", and if cluster.fw is missing, don't change any rules.
²²²



- Mail original -
De: "Thomas Lamprecht" 
À: "pve-devel" , "aderumier" 
Envoyé: Mardi 8 Janvier 2019 20:58:51
Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is 
replicated and rules are updated ?

Hi, 

On 1/8/19 7:37 PM, Alexandre DERUMIER wrote: 
> I'm able to reproduce with: 
> --- 
> on 1 host: 
> 
> cluster.fw: 
> [OPTIONS] 
> 
> enable: 1 
> policy_in: ACCEPT 
> 
> 
> 
> 
> #!/usr/bin/perl 
> 
> use IO::File; 
> use PVE::Firewall; 
> use Data::Dumper; 
> use Time::HiRes qw ( time alarm sleep usleep ); 
> 
> while(1){ 
> 
> $filename = "/etc/pve/firewall/cluster.fw"; 
> 
> if (my $fh = IO::File->new($filename, O_RDONLY)) { 
> 
> $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, 
> $verbose); 
> my $cluster_options = $cluster_conf->{options}; 
> 
> if (!$cluster_options->{enable}) { 
> print Dumper($cluster_options); 
> die "error\n"; 
> } 
> 
> } 
> usleep(100); 
> }; 
> 
> 
> the script is running fine. 
> 
> 
> on another host, edit the file (simple open/write), 
> then the script on first host, return 
> 
> $VAR1 = {}; 
> error 

that is expected, AFAICT, a modify operation shouldn't be: 
* read FILE -> modify -> write FILE 
but rather: 
* read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE 
if it's wanted that always a valid content is read. Else yes, you may have a 
small 
time window where the file is truncated. 

But, file_set_contents - which save_clusterfw_conf uses - does this already[0], 
so maybe this is the "high-level fuse rename isn't atomic" bug again... 
May need to take a closer look tomorrow. 

[0]: 
https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213
 

> 
> - Mail original - 
> De: "aderumier"  
> À: "pve-devel"  
> Envoyé: Mardi 8 Janvier 2019 19:15:06 
> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated 
> and rules are updated ? 
> 
> Hi, 
> I'm currently debugging a possible firewalling problem. 
> I'm running some cephfs client in vm, firewalled by proxmox. 
> cephfs client are really sensitive to network problem, and mainly with 
> packets logss or dropped packets. 
> 
> I'm really not sure, but I have currently puppet updating my cluster.fw, at 
> regular interval, 
> and sometimes, I have all the vm on a specific host (or multiple hosts), at 
> the same time, have a small disconnect (maybe some second). 
> 
> 
> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
> or if they are any chance, that during file replication, the firewall try to 
> read the file, 
> it could be empty ? 
> 
> 
> I just wonder (I'm really really not sure) if I could trigger this: 
> 
> 
> sub update { 
> my $code = sub { 
> 
> my $cluster_conf = load_clusterfw_conf(); 
> my $cluster_options = $cluster_conf->{options}; 
> 
> if (!$cluster_options->{enable}) { 
> PVE::Firewall::remove_pvefw_chains(); 
> return; 
> } 
> 
> 
> cluster.conf not readable/absent/ , and remove_pvefw_chains called. 
> then after some seconds, rules are applied again. 
> 
> 
> I'm going to add some log to try to reproduce it. (BTW, it could be great to 
> logs rules changed, maybe an audit log with a diff could be great) 
> ___ 
> pve-devel mailing list 
> pve-devel@pve.proxmox.com 
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> ___ 
> pve-devel mailing list 
> pve-devel@pve.proxmox.com 
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Thomas Lamprecht
Hi,

On 1/8/19 7:37 PM, Alexandre DERUMIER wrote:
> I'm able to reproduce with:
> ---
> on 1 host:
>
> cluster.fw:
> [OPTIONS]
>
> enable: 1
> policy_in: ACCEPT
>
>
>
>
> #!/usr/bin/perl
>
> use IO::File;
> use PVE::Firewall;
> use Data::Dumper;
> use Time::HiRes qw ( time alarm sleep usleep );
>
> while(1){
>
> $filename = "/etc/pve/firewall/cluster.fw";
>
> if (my $fh = IO::File->new($filename, O_RDONLY)) {
>
>  $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, 
> $fh, $verbose);
> my $cluster_options = $cluster_conf->{options};
>
> if (!$cluster_options->{enable}) {
>print Dumper($cluster_options);
>die "error\n";
> }
>
> } 
> usleep(100);
> };
>
>
> the script is running fine.
>
>
> on another host, edit the file (simple open/write),
> then the script on first host, return
>
> $VAR1 = {};
> error

that is expected, AFAICT,  a modify operation shouldn't be:
* read FILE -> modify -> write FILE
but rather:
* read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE
if it's wanted that always a valid content is read. Else yes, you may have a 
small
time window where the file is truncated.

But, file_set_contents - which save_clusterfw_conf uses - does this already[0],
so maybe this is the "high-level fuse rename isn't atomic" bug again...
May need to take a closer look tomorrow.

[0]: 
https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213

>
> - Mail original -
> De: "aderumier" 
> À: "pve-devel" 
> Envoyé: Mardi 8 Janvier 2019 19:15:06
> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated 
> and rules are updated ?
>
> Hi, 
> I'm currently debugging a possible firewalling problem. 
> I'm running some cephfs client in vm, firewalled by proxmox. 
> cephfs client are really sensitive to network problem, and mainly with 
> packets logss or dropped packets. 
>
> I'm really not sure, but I have currently puppet updating my cluster.fw, at 
> regular interval, 
> and sometimes, I have all the vm on a specific host (or multiple hosts), at 
> the same time, have a small disconnect (maybe some second). 
>
>
> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
> or if they are any chance, that during file replication, the firewall try to 
> read the file, 
> it could be empty ? 
>
>
> I just wonder (I'm really really not sure) if I could trigger this: 
>
>
> sub update { 
> my $code = sub { 
>
> my $cluster_conf = load_clusterfw_conf(); 
> my $cluster_options = $cluster_conf->{options}; 
>
> if (!$cluster_options->{enable}) { 
> PVE::Firewall::remove_pvefw_chains(); 
> return; 
> } 
>
>
> cluster.conf not readable/absent/ , and remove_pvefw_chains called. 
> then after some seconds, rules are applied again. 
>
>
> I'm going to add some log to try to reproduce it. (BTW, it could be great to 
> logs rules changed, maybe an audit log with a diff could be great) 
> ___ 
> pve-devel mailing list 
> pve-devel@pve.proxmox.com 
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
>
> ___
> pve-devel mailing list
> pve-devel@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Alexandre DERUMIER
I'm able to reproduce with:
---
on 1 host:

cluster.fw:
[OPTIONS]

enable: 1
policy_in: ACCEPT




#!/usr/bin/perl

use IO::File;
use PVE::Firewall;
use Data::Dumper;
use Time::HiRes qw ( time alarm sleep usleep );

while(1){

$filename = "/etc/pve/firewall/cluster.fw";

if (my $fh = IO::File->new($filename, O_RDONLY)) {

 $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, 
$verbose);
my $cluster_options = $cluster_conf->{options};

if (!$cluster_options->{enable}) {
   print Dumper($cluster_options);
   die "error\n";
}

} 
usleep(100);
};


the script is running fine.


on another host, edit the file (simple open/write),
then the script on first host, return

$VAR1 = {};
error





- Mail original -
De: "aderumier" 
À: "pve-devel" 
Envoyé: Mardi 8 Janvier 2019 19:15:06
Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated 
and rules are updated ?

Hi, 
I'm currently debugging a possible firewalling problem. 
I'm running some cephfs client in vm, firewalled by proxmox. 
cephfs client are really sensitive to network problem, and mainly with packets 
logss or dropped packets. 

I'm really not sure, but I have currently puppet updating my cluster.fw, at 
regular interval, 
and sometimes, I have all the vm on a specific host (or multiple hosts), at the 
same time, have a small disconnect (maybe some second). 


I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
or if they are any chance, that during file replication, the firewall try to 
read the file, 
it could be empty ? 


I just wonder (I'm really really not sure) if I could trigger this: 


sub update { 
my $code = sub { 

my $cluster_conf = load_clusterfw_conf(); 
my $cluster_options = $cluster_conf->{options}; 

if (!$cluster_options->{enable}) { 
PVE::Firewall::remove_pvefw_chains(); 
return; 
} 


cluster.conf not readable/absent/ , and remove_pvefw_chains called. 
then after some seconds, rules are applied again. 


I'm going to add some log to try to reproduce it. (BTW, it could be great to 
logs rules changed, maybe an audit log with a diff could be great) 
___ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

2019-01-08 Thread Alexandre DERUMIER
Hi,
I'm currently debugging a possible firewalling problem.
I'm running some cephfs client in vm, firewalled by proxmox.
cephfs client are really sensitive to network problem, and mainly with packets 
logss or dropped packets.

I'm really not sure, but I have currently puppet updating my cluster.fw, at 
regular interval,
and sometimes, I have all the vm on a specific host (or multiple hosts), at the 
same time, have a small disconnect (maybe some second).


I would like to known, if cluster.fw replication is atomic in /etc/pve/ ?
or if they are any chance, that during file replication, the firewall try to 
read the file,
it could be empty ?


I just wonder (I'm really really not sure) if I could trigger this:


sub update {
my $code = sub {

my $cluster_conf = load_clusterfw_conf();
my $cluster_options = $cluster_conf->{options};

if (!$cluster_options->{enable}) {
PVE::Firewall::remove_pvefw_chains();
return;
}


cluster.conf not readable/absent/  , and remove_pvefw_chains called.
then after some seconds, rules are applied again.


I'm going to add some log to try to reproduce it. (BTW, it could be great to 
logs rules changed, maybe an audit log with a diff could be great)
___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel