Re: [ClusterLabs] Alert notes

2016-06-17 Thread Dimitri Maziuk
On 06/17/2016 11:12 AM, Jan Pokorný wrote:

> (that being said, I've already presented my subversive opinion that
> shell introduces more headaches than reasonable, as using it may be
> most natural and with almost no barriers to entry, but it's actually quite
> hard to make scripts bullet-proof; say chances the script will be derailed
> just with a space-contained [not talking about quotes] parameter are
> quite high: http://clusterlabs.org/pipermail/users/2015-May/000403.html)

C has no strings, shells (plural) are evil, perl is unmaintainable, VM
(or whatever you call it: runtime, garbagle-collected)-based ones are
top-heavy and unpredictable, and python is the worst of all worlds. But
hey, it still way ahead of C++ on sanity points. Anyone tried gnat lately?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Alert notes

2016-06-17 Thread Jan Pokorný
On 15/06/16 18:45 +0200, Klaus Wenninger wrote:
> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>> Did you think about filtering the environment variables passed to the
>> alert scripts?  NOTIFY_SOCKET probably shouldn't be present, and PATH
>> probably shouldn't contain sbin directories; I guess all these are
>> inherited from systemd in my case.
> 
> It is just what crmd comes along with ... but interesting point ...

... and having Shellshock vulnerability in mind, also a little bit
worring (yes, even nowadays).

(that being said, I've already presented my subversive opinion that
shell introduces more headaches than reasonable, as using it may be
most natural and with almost no barriers to entry, but it's actually quite
hard to make scripts bullet-proof; say chances the script will be derailed
just with a space-contained [not talking about quotes] parameter are
quite high: http://clusterlabs.org/pipermail/users/2015-May/000403.html)

-- 
Jan (Poki)


pgpjRoiHqKzCJ.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger  writes:

> On 06/16/2016 11:05 AM, Ferenc Wágner wrote:
>
>> Klaus Wenninger  writes:
>>
>>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>>>
 I think the default timestamp should contain date and time zone
 specification to make it unambigous.
>>>
>>> Idea was to have a trade-off between length and amount of information.
>>
>> I don't think it's worth saving a couple of bytes by dropping this
>> information.  In many cases there will be some way to recover it (from
>> SMTP headers or system logs), but that complicates things.
>
> Wasn't about saving some bytes in the size of a file or so but
> rather to keep readability. If the timestamp fills your screen
> you won't be able to read the actual information...have a look
> at /var/log/messages...
> Pure intention was to have a default that creates a kind of nice-looking
> output together with the file-example to give people an impression
> what they could do with the feature.

I see.  Incidentally, the file example is probably the one which would
profit most of having full timestamps.  And some locking.

>> In a similar vein, keeping the sequence number around would simplify
>> alert ordering and loss detection on the receiver side.  Especially with
>> SNMP, where the transport is unreliable as well.
>
> Nice idea... any OID in mind?

No.  But you can always extend PACEMAKER-MIB.

> Unfortunately the sequence-number we have right now als environment-
> variable is not really fit for this purpuse. It counts up with each
> and every alert being sent on a single node. So if you have multiple
> alerts configured you would experience gaps that prevent you from
> using it as loss-detection.

I see, it isn't per alert, unfortunately.  Still better than nothing,
though...

 (BTW I'd prefer to run the alert scripts as a different user than the
 various Pacemaker components, but that would lead too far now.)
>>>
>>> well, something we thought about already and a point where the
>>> new feature breaks the ClusterMon-Interface.
>>> Unfortunately the impact is quite high - crmd has dropped privileges -
>>> but if the pain-level rises high enough ...
>>
>> There's very little room to do this.  You'd need to configure an alert
>> user and group, and store them in the saved uid/gid set before dropping
>> privileges for the crmd process.  Or use a separate daemon for sending
>> alerts, which feels cleaner.
>
> Yes 2nd daemon was the idea. We don't want to give more rights
> to crmd than it needs. Btw. the daemon is there already: lrmd ;-)

It's running as root already, so at least no problem changing to any
user.  And the default could be hacluster.

>> You are right.  The snmptrap tool does the string->binary conversion if
>> it gets the correct format.  Otherwise, if the length matches, is does a
>> plain cast to binary, interpreting for example 12:34:56.78 as
>> 12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
>> shouldn't let the uses choose any timestamp-format but
>> %Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
>> in the current design. 
>
> Well, generic vs. failsafe  ;-)
> Of course one could introduce something like the metadata in RAs
> to achieve things like that but we wanted to keep the ball flat...
> After all the scripts are just examples...and the timestamp-format
> that should work is given in the header of the script...

More emphasis would help, I think.

>> Maybe it would be more appropriate to get the timestamp from crmd as
>> a high resolution (fractional) epoch all the time, and do the string
>> conversion in the agents as necessary.  One could still control the
>> format via instance_attributes where allowed.  Or keep around the
>> current mechanism as well to reduce code duplication in the agents.
>> Just some ideas...
>
> epoch was actually my first default ...
> additional epoch might be interesting alternative...

It would be useful.  Actually, crm_time_format_hr() currently fails for
any format string ending with any %-escape but N.  For example, "%Yx" is
formatted as "2016x", but "%Y" returns NULL.  You can avoid fixing this
by providing a fractional epoch instead. :)
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Alert notes

2016-06-16 Thread Klaus Wenninger
On 06/16/2016 11:05 AM, Ferenc Wágner wrote:
> Klaus Wenninger  writes:
>
>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>>
>>> Please find some random notes about my adventures testing the new alert
>>> system.
>>>
>>> The first alert example in the documentation has no recipient:
>>>
>>> 
>>>
>>> In the example above, the cluster will call my-script.sh for each
>>> event.
>>>
>>> while the next section starts as:
>>>
>>> Each alert may be configured with one or more recipients. The cluster
>>> will call the agent separately for each recipient.
>> The goal of the first example is to be as simple as possible.
>> But of course it makes sense to mention that it is not compulsory
>> to ad a recipient. And I guess it makes sense to point that out
>> as it is just ugly to think that you have to fake a recipient while
>> it wouldn't make any sense in your context.
> I agree.
>
>>> I think the default timestamp should contain date and time zone
>>> specification to make it unambigous.
>> Idea was to have a trade-off between length and amount of information.
> I don't think it's worth saving a couple of bytes by dropping this
> information.  In many cases there will be some way to recover it (from
> SMTP headers or system logs), but that complicates things.
Wasn't about saving some bytes in the size of a file or so but
rather to keep readability. If the timestamp fills your screen
you won't be able to read the actual information...have a look
at /var/log/messages...
Pure intention was to have a default that creates a kind of nice-looking
output together with the file-example to give people an impression
what they could do with the feature.
>
> In a similar vein, keeping the sequence number around would simplify
> alert ordering and loss detection on the receiver side.  Especially with
> SNMP, where the transport is unreliable as well.
Nice idea... any OID in mind?
Unfortunately the sequence-number we have right now als environment-
variable is not really fit for this purpuse. It counts up with each and
every
alert being sent on a single node. So if you have multiple alerts
configured you
would experience gaps that prevent you from using it as loss-detection.
>
>>> (BTW I'd prefer to run the alert scripts as a different user than the
>>> various Pacemaker components, but that would lead too far now.)
>> well, something we thought about already and a point where the
>> new feature breaks the ClusterMon-Interface.
>> Unfortunately the impact is quite high - crmd has dropped privileges -
>> but if the pain-level rises high enough ...
> There's very little room to do this.  You'd need to configure an alert
> user and group, and store them in the saved uid/gid set before dropping
> privileges for the crmd process.  Or use a separate daemon for sending
> alerts, which feels cleaner.
Yes 2nd daemon was the idea. We don't want to give more rights
to crmd than it needs. Btw. the daemon is there already: lrmd ;-)
>>> The SNMP agent seems to have a problem with hrSystemDate, which should
>>> be an OCTETSTR with strict format, not some plain textual timestamp.
>>> But I haven't really looked into this yet.
>> Actually I had tried it with the snmptrap-tool coming with rhel-7.2
>> and it worked with the string given in the example.
>> Did you copy it 1-1? There is a typo in the document having the
>> double-quotes double. The format is strict and there are actually
>> 2 formats allowed - on with timezone and one without. The
>> format string given should match the latter.
> You are right.  The snmptrap tool does the string->binary conversion if
> it gets the correct format.  Otherwise, if the length matches, is does a
> plain cast to binary, interpreting for example 12:34:56.78 as
> 12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
> shouldn't let the uses choose any timestamp-format but
> %Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
Well, generic vs. failsafe  ;-)
Of course one could introduce something like the metadata in RAs
to achieve things like that but we wanted to keep the ball flat...
After all the scripts are just examples...and the timestamp-format
that should work is given in the header of the script...

> in the current design.  Maybe it would be more appropriate to get the
> timestamp from crmd as a high resolution (fractional) epoch all the
> time, and do the string conversion in the agents as necessary.  One
> could still control the format via instance_attributes where allowed.
> Or keep around the current mechanism as well to reduce code duplication
> in the agents.  Just some ideas...
epoch was actually my first default ...
additional epoch might be interesting alternative...


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bug

Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger  writes:

> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>
>> Please find some random notes about my adventures testing the new alert
>> system.
>>
>> The first alert example in the documentation has no recipient:
>>
>> 
>>
>> In the example above, the cluster will call my-script.sh for each
>> event.
>>
>> while the next section starts as:
>>
>> Each alert may be configured with one or more recipients. The cluster
>> will call the agent separately for each recipient.
>
> The goal of the first example is to be as simple as possible.
> But of course it makes sense to mention that it is not compulsory
> to ad a recipient. And I guess it makes sense to point that out
> as it is just ugly to think that you have to fake a recipient while
> it wouldn't make any sense in your context.

I agree.

>> I think the default timestamp should contain date and time zone
>> specification to make it unambigous.
>
> Idea was to have a trade-off between length and amount of information.

I don't think it's worth saving a couple of bytes by dropping this
information.  In many cases there will be some way to recover it (from
SMTP headers or system logs), but that complicates things.

In a similar vein, keeping the sequence number around would simplify
alert ordering and loss detection on the receiver side.  Especially with
SNMP, where the transport is unreliable as well.

>> (BTW I'd prefer to run the alert scripts as a different user than the
>> various Pacemaker components, but that would lead too far now.)
>
> well, something we thought about already and a point where the
> new feature breaks the ClusterMon-Interface.
> Unfortunately the impact is quite high - crmd has dropped privileges -
> but if the pain-level rises high enough ...

There's very little room to do this.  You'd need to configure an alert
user and group, and store them in the saved uid/gid set before dropping
privileges for the crmd process.  Or use a separate daemon for sending
alerts, which feels cleaner.

>> The SNMP agent seems to have a problem with hrSystemDate, which should
>> be an OCTETSTR with strict format, not some plain textual timestamp.
>> But I haven't really looked into this yet.
>
> Actually I had tried it with the snmptrap-tool coming with rhel-7.2
> and it worked with the string given in the example.
> Did you copy it 1-1? There is a typo in the document having the
> double-quotes double. The format is strict and there are actually
> 2 formats allowed - on with timezone and one without. The
> format string given should match the latter.

You are right.  The snmptrap tool does the string->binary conversion if
it gets the correct format.  Otherwise, if the length matches, is does a
plain cast to binary, interpreting for example 12:34:56.78 as
12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
shouldn't let the uses choose any timestamp-format but
%Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
in the current design.  Maybe it would be more appropriate to get the
timestamp from crmd as a high resolution (fractional) epoch all the
time, and do the string conversion in the agents as necessary.  One
could still control the format via instance_attributes where allowed.
Or keep around the current mechanism as well to reduce code duplication
in the agents.  Just some ideas...
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Alert notes

2016-06-15 Thread Klaus Wenninger
On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
> Hi,
>
> Please find some random notes about my adventures testing the new alert
> system.
>
> The first alert example in the documentation has no recipient:
>
> 
>
> In the example above, the cluster will call my-script.sh for each
> event.
>
> while the next section starts as:
>
> Each alert may be configured with one or more recipients. The cluster
> will call the agent separately for each recipient.
The goal of the first example is to be as simple as possible.
But of course it makes sense to mention that it is not compulsory
to ad a recipient. And I guess it makes sense to point that out
as it is just ugly to think that you have to fake a recipient while
it wouldn't make any sense in your context.
>
> and the rest of the documentation considers the recipient always
> present.  For example, in table 7.2:
>
> CRM_alert_recipientThe configured recipient
>
> then
>
> Alert agents will be called once per recipient.
>
> While in specialized cases it certainly makes sense that some alerts
> don't take recipients, I find it confusing that the first introductory
> example demonstrates something totally unacknowledged by the definitive
> text following it.
>
> I think the default timestamp should contain date and time zone
> specification to make it unambigous.
Idea was to have a trade-off between length and amount of
information.
>
> Did you think about filtering the environment variables passed to the
> alert scripts?  NOTIFY_SOCKET probably shouldn't be present, and PATH
> probably shouldn't contain sbin directories; I guess all these are
> inherited from systemd in my case.
It is just what crmd comes along with ... but interesting point ...
>
> I was also hit again by the "strange umask" problem here.  It's set to
> 0026, which tends to get where nobody expects it (see for example
> https://bugs.launchpad.net/fuel/+bug/1397284,
> http://clusterlabs.org/pipermail/users/2015-June/000682.html, or
> http://bugs.clusterlabs.org/show_bug.cgi?id=5268).  In practice, alert
> scripts won't often create local files, but it's a pity we have to fight
> fallout from the logfile creation again.
again heritage from crmd and an interesting point ...
>
> (BTW I'd prefer to run the alert scripts as a different user than the
> various Pacemaker components, but that would lead too far now.)
well, something we thought about already and a point where the
new feature breaks the ClusterMon-Interface.
Unfortunately the impact is quite high - crmd has dropped privileges -
but if the pain-level rises high enough ...

>
> The SNMP agent seems to have a problem with hrSystemDate, which should
> be an OCTETSTR with strict format, not some plain textual timestamp.
> But I haven't really looked into this yet.
Actually I had tried it with the snmptrap-tool coming with rhel-7.2
and it worked with the string given in the example.
Did you copy it 1-1? There is a typo in the document having the
double-quotes double. The format is strict and there are actually
2 formats allowed - on with timezone and one without. The
format string given should match the latter.



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org