Re: [ClusterLabs Developers] No ruinings in 2020! (Was: I will ruin your Christmas holidays Developers!)

2020-01-02 Thread Jan Pokorný
On 02/01/20 12:47 +0100, Jan Pokorný wrote:
> On 21/12/19 01:29 -0500, Digimer wrote:
>> I'm not sure how this got through the queue... Sorry for the noise.
> 
> in fact, it did not from what I can see, meaning that you (and perhaps
> other shadow moderators) do a stellar job, despite this not happening
> in a direct sight -- or in other words, practically non-existent spam
> is a proof how high the bar is (you can attest by scanning the long
> abandoned lists and comparing[*]).
> 
> Thanks for that, and to the broader community, my wishes for the best
> in the new year (whether it has just arrived in your calendar, is about
> to happen soon for you, or at any other occasion that will eventually
> come, alike).
> 
> To summarize, the most generic, high-level agenda regarding (Julian)

^ just teasing your (or my, TBH) acumen, Gregorian is correct here :-)

> year 2020 likely is:
> 
> - cluster summit:
>   http://plan.alteeve.ca/index.php/HA_Cluster_Summit_2020
> 
> - official EOL for Python 2:
>   https://www.python.org/psf/press-release/pr20191220/
> 
> Amendable, indeed, just respond on-list.
> 
>> digimer
>> 
>> On 2019-12-19 1:19 p.m., TorPedoHunt3r wrote:
>>> 
>> 
> 
> [*] These lists are, AFAICT, abandonded, please don't revive, treat
> just as a visitor in the reservation; visiting the links _not_
> recommended, only at your own risk:
> https://lists.clusterlabs.org/pipermail/pacemaker/2016-August/thread.html
> 
> https://lists.linuxfoundation.org/pipermail/ha-wg-technical/2019-December/thread.html
> 
> P.S. Sorry for piggy-backing here :-)

-- 
Jan (Poki)


pgpPlFSu39TZh.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] No ruinings in 2020! (Was: I will ruin your Christmas holidays Developers!)

2020-01-02 Thread Jan Pokorný
On 21/12/19 01:29 -0500, Digimer wrote:
> I'm not sure how this got through the queue... Sorry for the noise.

in fact, it did not from what I can see, meaning that you (and perhaps
other shadow moderators) do a stellar job, despite this not happening
in a direct sight -- or in other words, practically non-existent spam
is a proof how high the bar is (you can attest by scanning the long
abandoned lists and comparing[*]).

Thanks for that, and to the broader community, my wishes for the best
in the new year (whether it has just arrived in your calendar, is about
to happen soon for you, or at any other occasion that will eventually
come, alike).

To summarize, the most generic, high-level agenda regarding (Julian)
year 2020 likely is:

- cluster summit:
  http://plan.alteeve.ca/index.php/HA_Cluster_Summit_2020

- official EOL for Python 2:
  https://www.python.org/psf/press-release/pr20191220/

Amendable, indeed, just respond on-list.

> digimer
> 
> On 2019-12-19 1:19 p.m., TorPedoHunt3r wrote:
>> 
> 

[*] These lists are, AFAICT, abandonded, please don't revive, treat
just as a visitor in the reservation; visiting the links _not_
recommended, only at your own risk:
https://lists.clusterlabs.org/pipermail/pacemaker/2016-August/thread.html

https://lists.linuxfoundation.org/pipermail/ha-wg-technical/2019-December/thread.html

P.S. Sorry for piggy-backing here :-)

-- 
Jan (Poki)


pgpOqoimCcN49.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] Consensus on to-avoid in pacemaker, unnecessary proliferation of redundant goal-achievers, undocumented options and such? (Was: maintenance vs is-managed; different levels of

2019-12-18 Thread Jan Pokorný
On 18/12/19 02:36 +0100, Jan Pokorný wrote:
> [...]
> 
> - based on the above, increase of redundance/burden, plus
>   maintenance costs not just at pacemaker itself (more complex
>   codebase) but also any external tooling incl. higher level tools
>   (ditto, plus ensuring the change is caught by these at all[*]),
>   confusion on combinability, etc.
> 
> [...]
> 
> [*] for instance, I missed that change when suggesting the equivalent
> to pcs team: https://bugzilla.redhat.com/show_bug.cgi?id=1303969
> but hey, they made do avoiding that configuration addition
> altogether :-)

Oh, may be caused with this "maintenance" resource meta-attribute not
being documented at all in 1.1 line (while it was introduced in 1.1.12
and now we are at 1.1.22):
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#_resource_meta_attributes

Again, something we shall rather prevent (undocumented options, not
even in some "experimental" section that would list provisions that
may be changed or disappear again for things in need of some gradual,
multi-release incubation).

Can we agree on some principles likes this?

-- 
Jan (Poki)


pgptN5ssh740u.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] [deveoplers] maintenance vs is-managed; different levels of the maintenance property

2019-12-17 Thread Jan Pokorný
On 28/11/19 11:36 +, Yan Gao wrote:
> On 11/28/19 1:19 AM, Ken Gaillot wrote:
>> There is some room for coming up with better option naming and
>> meaning.  For example maybe the cluster-wide "maintenance-mode"
>> should be something like "force-maintenance" to make clear it takes
>> precedence over node and resource maintenance.
> Not sure if renaming would introduce even more confusion ...

+1

> But indeed, documentation definitely makes lot of sense.

+1

> Based on the whole idea, an inconsistent logic is in here then:
> 
> https://github.com/ClusterLabs/pacemaker/commit/9a8cb86573#diff-b4b7b0fdcefcd3eb5087dfbf0d101ec4R471
> 
> We should probably remove the "else" there, so that cluster-wide 
> maintenance-mode=true ALWAYS takes precedence.
> 
> Currently there's a corner case:
> 
> * Cluster maintenance-mode=true
> * Resource is-managed=true
> * Resource maintenance=false
> 
> , which makes an exception that the resource will be "managed".

ouch, slight +1 (oh, these least-surprise concerns where it verges
on least surprise towards existing reliance vs. newcomers, those
are clearly in a mutual contradiction, making it tough, not for
the first time).

Anyway, sorry for picking this is as an exemplary showcase (we all
are learning as we can, nobody is born with experience, but since
we are at the dev list, and we are not used to some in-community
retrospectives (yet?) slash meta-talk about approaches exactly to
take away something, please forgive, it has nothing to do with who,
just what) of how we shall _not_ be extending pacemaker, since
what seemed a simple, straightforward and justified addition of
a new configuration toggle carries, in hindsight, a lot of hidden
costs that were not forseen at that time:

- most notably that there are now two competitive options, one
  being the specialization (expressible with a combination with
  other configuration steps!) of other established option
  (correct me if I am wrong)

- based on the above, any human or machine needs to perform two
  step check (easy to miss) to be reasonably sure some claim
  holds or not (that the resource will be part of the cluster
  acting)

- based on the above, increase of redundance/burden, plus
  maintenance costs not just at pacemaker itself (more complex
  codebase) but also any external tooling incl. higher level tools
  (ditto, plus ensuring the change is caught by these at all[*]),
  confusion on combinability, etc.

There are bounds to evolution of code if there's some responsibility
behind it, let's keep up on sustainability (in all directions if
possible).  Suggested renaming would be a misstep in that regard
as well, I think.  High-level tools can abstract it whatever way
they like...

Speaking of these (I think they are still rather mid-level tools,
there is an enormous space for improvement in them when they dettach
from trying to carry 1:1 mapping to low level bits and move closer to
user-oriented concepts, otherwise it feels like using plain TeX
forever when it was shown that user-oriented simplification like
LaTeX can go far beyond, still benefitting the universality aspect
of the former) and their advent, I think it's fully OK to resist
urges to combine existing primitives in some composite way unless
there's a clear blocker (risk of race conditions, for instance).
These combinations shall occur higher up, outside (there were some
"middleware" ideas in the talks previously, not sure where it went,
but given a contract on the API, it well could be outside the
pacemaker project).

[*] for instance, I missed that change when suggesting the equivalent
to pcs team: https://bugzilla.redhat.com/show_bug.cgi?id=1303969
but hey, they made do avoiding that configuration addition
altogether :-)

-- 
Jan (Poki)


pgpM3wkYxXGpS.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] Building clufter on EL8

2019-10-31 Thread Jan Pokorný
Hi Digimer o/

On 30/10/19 16:24 -0400, Digimer wrote:
> While waiting to see what CentOS 8 will do with regard to HA,

you are not the only surprised here

> I decided to rebuild the rhel 8 packages for our own repo[1]. To
> this end, I've rebuilt all packages, except clufter.
> 
>   The clufter package relies on jing, and jing is not provided in RHEL
> 8. Obviously, clufter was build for RHEL 8, so I'm curious how this was
> done...

Note that buildroot packages are a superset of packages available
through the main channels, for various non-technical reasons,
e.g. giving up on support for such.  Brand new for RHEL 8 are
"no support" channels like Code Ready Builder (CRB), and it might
be there, or not.

Frankly, I've put quite some effort to have jing (and sibling, trang)
up for straightforward grab, but it was basically killed in/by the
process without receiving any further support, leaving me detached
altogether on these political basis.  Can consider myself lucky to
at least have jing in said buildroot :-/

> I started the process of building jing myself, but very quickly fell
> into a very deep dependency well.
> 

> Tips?

Your options are:

1. use jing (and a very few deps, perhaps) from said CBR (if
   available), Fedora or older CentOS

2. edit spec file so that it skips jing-involved steps altogher;
   note that such measure was added only to provide additional
   guarantee that even if clufter itself is not updated, at least,
   on every rebuild (such as in various mass ones in Fedora),
   the newest schema from pacemaker at the time will be automatically
   adopted (clufter requires single-file type of schemas, whereas
   pacemaker is shipped with decomposed file hierarchy of these,
   and to that end, there is no known way to aggregate the content
   like this, except for some unmaintained XSL stylesheet I found
   back then and did not exactly trust it), but for generic use
   case, it shall be OK to use even older bundled versions, and as
   mentioned earlier, there was no allocation for clufter to catch
   up on various aspects of the recent development, meaning that
   3.0+ schema support is on may-work basis


Btw. I am a long time prononent of engaging jing validator in
pacemaker itself, since libxml2 based RelaxNG schema validation
is not capable of precise diagnostics, and is prone to bad
performance (compared to jing, due to the nature of different
approaches, I believe) for more complex documents (and/or grammars).
I.e. what we have in pacemaker right now downright hurts the
user experience shall there be violations in the base XML.
Beside, libxml2 RelaxNG schema validation tends to be buggy
to this day (just a few months back, I fixed some of these
long lurking issues, but some aside regression tests effectively
require jing because of that).


P.S. I noticed you've sent the question also to cluster-devel
 beside developers@c.o ML without actually sending just
 a singleton, meaning I cannot reply-all conveniently,
 but I tried my best to cover that.

-- 
Jan (Poki)


pgpjnkkDvjyXN.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] Extend enumeration of OCF return values

2019-10-16 Thread Jan Pokorný
On 16/10/19 09:18 +, Yan Gao wrote:
> On 10/15/19 4:31 PM, Ken Gaillot wrote:
>> On Tue, 2019-10-15 at 13:08 +0200, Tony den Haan wrote:
>>> Hi,
>>> I ran into getting "error 1" from portblock, so OCF_ERR_GENERIC,
>>> which for me doesn't guarantee the error was RC from portblock or
>>> pacemaker itself.
>>> Wouldn't it be quite useful to
>>> 1) give the agents a unique number to add to the OCF RC code, thus
>>> helping to determine origin of error
>>> 2) show an actual error string instead of "unknown error(1)". This is
>>> the last you want to see when a cluster is stuck.
>>> 
>>> Tony
>> 
>> I agree it's an issue, but the exit codes have to stay fairly generic.
>> There are only 255 possible exit codes, and half of those most shells
>> use for signals. Meanwhile there are dozens of agents. More
>> importantly, Pacemaker needs standard meanings to know how to respond.
>> 
>> However there are possibilities:
>> 
>> - OCF could add a few more codes for common error conditions. (This
>> requires updating the standard, as well as software such as Pacemaker
>> to be aware of them.)
>> 
>> - OCF already supports an arbitrary string "exit reason" which
>> pacemaker will display beyond just "unknown". It's up to the individual
>> agents to support this, and all of them should. Agents can get as
>> specific as they like with exit reasons.
>> 
>> - Agents can also log to the system log, or print error output which
>> pacemaker will log in its detail log. Many already provide good
>> information this way, but there's always room for improvement.
>> 
> All make sense. A lot of times, I can feel it's the wording "unknown 
> error" that frustrates users since they are definitely not in a good 
> mood seeing any errors in their beloved clusters, not to mention ones 
> are even "unknown" ;-)
> 
> As a manner of fact, it's probably the mostly returned error. I'd prefer 
> to call it something different from user interfaces, for example 
> "generic error" or just "error". Since:

\me votes for "sundry error" :-)

Seriously, better for getting the right hits of a random $WEBSEARCHER
since this is the first line of universal defense for a growing
population.  Assumes proper and web bots explorable documentation.

> - If "exit reason" gives a hint, it's not really "unknown".
> - Even if there's no "exit reason" given, it doesn't mean it's 
> "unknown". Usually clues could be found from logs.

-- 
Jan (Poki)


pgpwAykXyS9UZ.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] FYI: looks like there are DNS glitches with clusterlabs.org subdomains

2019-10-09 Thread Jan Pokorný
Neither bugs.c.o nor lists.c.o work for me ATM.
Either it resolves by itself, or Ken will intervene, I believe.

-- 
Jan (Poki)


pgpVtzxiRrw_d.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] kronosnet v1.12 released

2019-09-20 Thread Jan Pokorný
On 20/09/19 05:22 +0200, Fabio M. Di Nitto wrote:
> We are pleased to announce the general availability of kronosnet v1.12
> (bug fix release)
> 
> [...]
> 
> * Add support for musl libc

Congrats, and the above is a great news, since I've been toying with
an idea of putting together a truly minimalistic and vendor neutral
try-out image based on Alpine Linux, which uses musl as libc of choice
(for its bloatlessness, just as there's no systemd, etc.).

-- 
Jan (Poki)


pgpurvnsq6XPh.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] performance problems with ocf resource nfsserver script

2019-09-12 Thread Jan Pokorný
Hello Eberhard,

On 11/09/19 10:01 +0200, Eberhard Kuemmerle wrote:
> I use pacemaker with a some years old hardware.
> In combination with an rsync backup, I had nfsserver monitoring
> timeouts that resulted in stonith fencing events...
> 
> So I tested the ocf resource nfsserver script and found, that even
> in an idle situation (without rsync or other heavy load), 'nfsserver
> monitor' was running for more than 10 seconds.
> 
> I found two critical actions in the script:
> - systemctl status nfs-server  (which also calls journalctl)
> - systemctl list-unit-files
> 
> So I modified the script and replaced
> 
> systemctl $cmd $svc
> by
> systemctl -n0 $cmd $svc
> in nfs_exec() to suppress the journalctl call
> 
> and
> 
> systemctl list-unit-files
> by
> systemctl list-unit-files 'nfs-*'
> and
> systemctl list-unit-files 'rpc-*'
> 
> That reduced the runtime for 'nfsserver monitor' to less than 0.2
> secons!

That's a great improvement, indeed!

Thanks for being attentative to these details that actually sometimes
matter, as you could attest with your system.

> So I strongly recommend to integrate that modification in your
> repository.
> 
> [...]
> 
> [actual patch]
> 

Assuming your intention is to upstream your changes (and therefore you
are consent to publish your changes under the same conditions/license
as applied to that very file per its embedded notice in the header
~ GPLv2+), and assuming that publishing your changes on this list
is your preferred workflow (development itself occurs at GitHub at
this point), I brought the patch where it will be paid a bigger
attention:

https://github.com/ClusterLabs/resource-agents/pull/1398

Feel free to comment further at either location.

Btw. haven't checked, but per the timestamp alone, I supposed the
doubled messages on this list carry the identical version of the
patch.  Don't stay quiet if either this and/or license-kept
assumption do not apply, please :-)

-- 
Poki


pgpD8awBOotnA.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] Reminder that /proc is just rather an unreliable quirk, not a firm grip on processes

2019-07-08 Thread Jan Pokorný
On 03/07/19 11:45 +0200, Jan Pokorný wrote:
> [...]

Accidentally, something fundamentally related to process scan
and related imprecize overapproximation just popped up:
https://lists.clusterlabs.org/pipermail/users/2019-July/025978.html

-- 
Jan (Poki)


pgpyJOvTXCBYc.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] Reminder that /proc is just rather an unreliable quirk, not a firm grip on processes

2019-07-03 Thread Jan Pokorný
[in a sense, this is a follow-up for my recent post:
https://lists.clusterlabs.org/pipermail/users/2019-May/025749.html]

Have come across an interesting experience regarding /proc traversal:

https://rkeene.org/projects/info/wiki/173

(as well as a danger of exhausting available inodes mentioned in the
new spurred discussion: https://lobste.rs/s/ihz50b/day_proc_died)

Even if it wasn't observed with Linux in that particular case, it just
adds to the overall arguments why to avoid it, directly or indirectly
(that's what ps, pidof, killall etc. do make use of) whenever possible,
for instance:

- (at least on most systems) no snapshot semantics, meaning the
  scan-through is completely racy and ephemeral processes (or
  a fork chain thereof, see also CVE-2018-1121 for intentional
  carefully crafted abuse) are easy to miss completely

- problem of recycled PIDs is imminent (however theoretical), when
  the observer cannot subscribe itself to watch for changes in the
  process under supervision (verging on problems related to polling
  vs. event based systems, incl. timely responses to changes)

- finally, all these problems with unexpected behaviours of /proc
  under corner case situations like that mentioned initially, but
  add the possibility that arbitrary unprivileged users can
  deliberately block /proc enumeration triggered in other processes
  incl. privileged ones in Linux systems (see CVE-2018-1120[*]),
  for instance

Now, why I am mentioning, higher layers of cluster stack rely
heavily on /proc inspection, net outcome being that it can only
be as realiable as /proc filesystem is, not more.

So my ask here is to use our brain cluster (pun intended) so as
to devise ways how to get less reliant on /proc based enumeration.
One portable idea is to allow for agents persistency, i.e., the
agent would be directly informed its child (effectively the service
being run as proxied by this agent instance).  One non-portable idea
would be to leverage pidfd facility recently introduced into Linux
(as already mentioned in the May's post).

Good news is that there's still room for _also_ cheap improvements,
such as what I did along the recent security fixes for pacemaker
(in a nutshell: IPC end-points already constitute the systemd-wide
singletons, equivalent for our purposes with checking via /proc,
allowing for a swap, and -- as a paradox -- this positive change
was secondary as it effectively enabled us to close the security
hole at hand, which was the primary objective).

Apparently, the most affected are resource agents.

[*] I've mentioned such risks once on this list already:
https://lists.clusterlabs.org/pipermail/developers/2018-May/001237.html
but alas, it received no responses

-- 
Jan (Poki)


pgp_j4VnmQ7e2.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] If anybody develops against libpe_status.so: skipped soname bump (in 2.0.2)

2019-06-19 Thread Jan Pokorný
On 14/06/19 18:46 -0500, Ken Gaillot wrote:
> On Fri, 2019-06-14 at 23:57 +0200, Jan Pokorný wrote:
>> On 14/06/19 14:56 -0500, Ken Gaillot wrote:
>>> On Fri, 2019-06-14 at 20:13 +0200, Jan Pokorný wrote:
>>>>> On Thu, 2019-06-06 at 10:12 -0500, Ken Gaillot wrote:
>>> Since those functions are internal API, we don't need a soname
>>> bump.  Distributing the header was a mistake and should not be
>>> considered making it public API. The only functions in there that
>>> had doxygen blocks were marked internal, so that helps.
>>> 
>>> As an aside, the entire libpe_status was undocumented until 2.0.1,
>>> but that was an oversight (a missing \file line).
>> 
>> In FOSS, (un)documentation aspect doesn't play that much of a role...
>> 
>>> In practice there were some projects that used it, and we did bump
>>> the soname for most functions. Now however it's documented
>>> properly, so the line should be clear.
>> 
>> Not at all, see above.
>> 
>> Traces of the pre-existing mess have some momentum.
>> 
>> Anyway, good to know the root cause, question is how to deal with
>> the still real fallout.
> 
> What's the fallout? An internal function

"sort of", but definitely only after said forthcoming change :-)

> that no external application uses changed

"sort of", but they could with the header interpretable as public
(since Pacemaker-1.1.15), just wasn't discovered before (I don't
think I ever tried to match the changes back to the headers, plus
how these headers are to be interpreted)

> which doesn't require a soname bump.
> 
> I'll handle it by renaming the header and moving it to noinst.

Yes, that will help going forward.

This thread hopefully (justly) mitigates any surprising sharp edges
till this point (effectively towards any potential usage accustomed
in 1.1.15 - 2.0.1 timespan) should there be any.

Anyway, it looks like libabigail is a very useful tool we might
consider alongside or instead of abi-compliance-checker.
It looks like it can be told precisely which headers are private
and which not, so there could be even be some sort of authoritative
listing (regardless of documentation or not, as mentioned, that's
secondary with FOSS projects) to source that from.

One idea there would be to add another, standalone pass to our
Travis CI tests that would leverage TRAVIS_COMMIT_RANGE env. variable
(too bad that it's all stateless, without any convenient lookaside
storage, or is there?) to get the two builds (looks like it could
naturally be done in rather an efficient manner) for a subsequent
ABI comparison.  Either merely "informative" (i.e., pass unless there's
an actual build failure), or "punishing" if we can afford to switch
more into "always ready" paradigm (which CI is all about) -- when the
pull request destructs the ABI in some not-mere-addition way (while
soname bump didn't occur?  or when there are at least any API-ABI
hurdles found?), raise a flag.  It would then be up to deliberation
whether it's a blocker or not.  But would attract the attention for
sure, hence more care, in an ahead-of-time fashion.

-- 
Jan (Poki)


pgpqwV0p9BA0e.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] If anybody develops against libpe_status.so: skipped soname bump (Was: Pacemaker 2.0.2 final release now available)

2019-06-14 Thread Jan Pokorný
On 14/06/19 14:56 -0500, Ken Gaillot wrote:
> On Fri, 2019-06-14 at 20:13 +0200, Jan Pokorný wrote:
>>> On Thu, 2019-06-06 at 10:12 -0500, Ken Gaillot wrote:
>>> 
>>> Source code for the Pacemaker 2.0.2 and 1.1.21 releases is now
>>> available:
>>> 
>>> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0.2
>>> 
>>> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.21
>> 
>> In retrospect (I know, everybody is a general once the battle is
>> over), called out for with some automated tests in Fedora, there were
>> some slight discrepancies -- depending on whether any external
>> clients
>> of particular "wannabe internal" libraries of pacemaker accompanied
>> with "wannabe internal" headers, none of which are marked so
>> expressly
>> (and in case of headers, are usually shipped in dev packages anyway).
> 
> All public API is documented:
> 
>   https://clusterlabs.org/pacemaker/doxygen/
> 
> Anything not documented there is private API.

It's rather simplistic (and hypocritical) view not being part of any
written "contract", isn't it? :-)

> remote.h should be in noinst_HEADERS, thanks for catching that. It
> would also be a good idea to put "_internal" in all internal
> headers' names to be absolutely clear; most of them already have it.

Yes, that was that surprising moment here.

>> For the piece of mind, I am detailing the respective library that
>> would likely have been eligible for an explicit soname bump and why.
>> If you feel affected, please speak up so we have a clear incentive to
>> publish a "hotfix" for downstreams and direct consumers, otherwise
>> at least I don't feel compelled to anything immediate beyond this
>> FYI,
>> and we shall rather do it in 2.0.3 even if not otherwise justified
>> with an inter-release delta, so there isn't a tiniest glitch possible
>> when 2.0.2 is skipped on the upgrade path (which is generally not
>> recommended but would be understandable if you happen to rely on
>> those very libpe_status.so ABI details).
>> 
>> The mentioned ABI changes are:
>> 
>> * libpe_status.so.28.0.2 (2.0.1: soname 28.0.1)
>>   - include/crm/pengine/remote.h: function renames, symbolic notation:
>> { -> pe__}{is_baremetal_remote_node -> is_remote_node,
>>is_container_remote_node -> is_guest_node,
>>is_remote_node -> is_guest_or_remote_node,
>>is_rsc_baremetal_remote_node -> resource_is_remote_conn,
>>rsc_contains_remote_node -> resource_contains_guest_node}
>> 
>> (all other ABI breaking changes appear self-contained for not
>> being related to anything exposed through what could be considered
>> a public header/API -- not to be confused with ABI)
> 
> Since those functions are internal API, we don't need a soname bump.
> Distributing the header was a mistake and should not be considered
> making it public API. The only functions in there that had doxygen
> blocks were marked internal, so that helps.
> 
> As an aside, the entire libpe_status was undocumented until 2.0.1,
> but that was an oversight (a missing \file line).

In FOSS, (un)documentation aspect doesn't play that much of a role...

> In practice there were some projects that used it, and we did bump
> the soname for most functions. Now however it's documented properly,
> so the line should be clear.

Not at all, see above.

Traces of the pre-existing mess have some momentum.

Anyway, good to know the root cause, question is how to deal with
the still real fallout.

>> Note that there's at least a single publicly known consumer of
>> libpe_status.so, but luckily, sbd only uses some unaffected pe_*
>> functions.  Said after-the-fact bump of said library would require
>> it to be rebuilt as well (and all the SW that'd be in the same
>> boat), so even less appealing to do that now, but note that
>> such rebuild will be needed with said planned bump for 2.0.3.
>> 
>> But perhaps, some other changes as announced in [1] will be faster
>> than that -- to that account, I'd note that perhaps applying
>> single source -> multiple binary copies of code scheme is not all
>> that bad and we could move some of shared internal only code into
>> static libraries subsequently used to feed the links from the
>> actual daemons/tools code objects -- or the private libraries
>> shall at least be factually privatized/unshared, i.e., put into
>> a private, non-standard location (this is what, e.g., systemd uses)
>> where only "accustomed" executables can find them.
>> 
>> [1]https://lists.clusterlabs.org/pipermail/developers/2019-February/001358.html

-- 
Jan (Poki)


pgpJu_eOQaMYq.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] If anybody develops against libpe_status.so: skipped soname bump (Was: Pacemaker 2.0.2 final release now available)

2019-06-14 Thread Jan Pokorný
On 14/06/19 20:13 +0200, Jan Pokorný wrote:
> For the piece of mind, I am detailing the respective library that
> would likely have been eligible for an explicit soname bump and why.
> If you feel affected, please speak up so we have a clear incentive to
> publish a "hotfix" for downstreams and direct consumers, otherwise
> at least I don't feel compelled to anything immediate beyond this FYI,
> and we shall rather do it in 2.0.3 even if not otherwise justified
> with an inter-release delta, so there isn't a tiniest glitch possible
> when 2.0.2 is skipped on the upgrade path (which is generally not
> recommended but would be understandable if you happen to rely on
> those very libpe_status.so ABI details).

Of course, an alternative applicable right now (also suitable for
those who self-compile ... and use libpe_status.so in their client
code at the same time) and avoiding the soname bump is to add the
original symbols back, e.g. using

  __attribute__((alias ("original_name")))
  
for brevity.

> The mentioned ABI changes are:
> 
> * libpe_status.so.28.0.2 (2.0.1: soname 28.0.1)
>   - include/crm/pengine/remote.h: function renames, symbolic notation:
> { -> pe___}{is_baremetal_remote_node -> is_remote_node,
> is_container_remote_node -> is_guest_node,
>   is_remote_node -> is_guest_or_remote_node,
>   is_rsc_baremetal_remote_node -> resource_is_remote_conn,
>   rsc_contains_remote_node -> resource_contains_guest_node}
> 
> (all other ABI breaking changes appear self-contained for not
> being related to anything exposed through what could be considered
> a public header/API -- not to be confused with ABI)

-- 
Jan (Poki)


pgpBiHNh8gfAK.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] If anybody develops against libpe_status.so: skipped soname bump (Was: Pacemaker 2.0.2 final release now available)

2019-06-14 Thread Jan Pokorný
> On Thu, 2019-06-06 at 10:12 -0500, Ken Gaillot wrote:
> 
> Source code for the Pacemaker 2.0.2 and 1.1.21 releases is now
> available:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0.2
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.21

In retrospect (I know, everybody is a general once the battle is
over), called out for with some automated tests in Fedora, there were
some slight discrepancies -- depending on whether any external clients
of particular "wannabe internal" libraries of pacemaker accompanied
with "wannabe internal" headers, none of which are marked so expressly
(and in case of headers, are usually shipped in dev packages anyway).

For the piece of mind, I am detailing the respective library that
would likely have been eligible for an explicit soname bump and why.
If you feel affected, please speak up so we have a clear incentive to
publish a "hotfix" for downstreams and direct consumers, otherwise
at least I don't feel compelled to anything immediate beyond this FYI,
and we shall rather do it in 2.0.3 even if not otherwise justified
with an inter-release delta, so there isn't a tiniest glitch possible
when 2.0.2 is skipped on the upgrade path (which is generally not
recommended but would be understandable if you happen to rely on
those very libpe_status.so ABI details).

The mentioned ABI changes are:

* libpe_status.so.28.0.2 (2.0.1: soname 28.0.1)
  - include/crm/pengine/remote.h: function renames, symbolic notation:
{ -> pe___}{is_baremetal_remote_node -> is_remote_node,
is_container_remote_node -> is_guest_node,
is_remote_node -> is_guest_or_remote_node,
is_rsc_baremetal_remote_node -> resource_is_remote_conn,
rsc_contains_remote_node -> resource_contains_guest_node}

(all other ABI breaking changes appear self-contained for not
being related to anything exposed through what could be considered
a public header/API -- not to be confused with ABI)

Note that there's at least a single publicly known consumer of
libpe_status.so, but luckily, sbd only uses some unaffected pe_*
functions.  Said after-the-fact bump of said library would require
it to be rebuilt as well (and all the SW that'd be in the same
boat), so even less appealing to do that now, but note that
such rebuild will be needed with said planned bump for 2.0.3.

But perhaps, some other changes as announced in [1] will be faster
than that -- to that account, I'd note that perhaps applying
single source -> multiple binary copies of code scheme is not all
that bad and we could move some of shared internal only code into
static libraries subsequently used to feed the links from the
actual daemons/tools code objects -- or the private libraries
shall at least be factually privatized/unshared, i.e., put into
a private, non-standard location (this is what, e.g., systemd uses)
where only "accustomed" executables can find them.

[1] https://lists.clusterlabs.org/pipermail/developers/2019-February/001358.html

-- 
Poki


pgpCVko0r_AcB.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] Multiple processes appending to the same log file questions (Was: Pacemaker detail log directory permissions)

2019-04-30 Thread Jan Pokorný
[let's move this to developers@cl.o, please drop users on response
unless you are only subscribed there, I tend to only respond to the
lists]

On 30/04/19 13:55 +0200, Jan Pokorný wrote:
> On 30/04/19 07:55 +0200, Ulrich Windl wrote:
>>>>> Jan Pokorný  schrieb am 29.04.2019 um 17:22
>>>>> in Nachricht <20190429152200.ga19...@redhat.com>:
>>> On 29/04/19 14:58 +0200, Jan Pokorný wrote:
>>>> On 29/04/19 08:20 +0200, Ulrich Windl wrote:
>> I agree that multiple threads in one thread have no problem using
>> printf(), but (at least in the buffered case) if multiple processes
>> write to the same file, that type of locking doesn't help much IMHO.
> 
> Oops, you are right, I made a logical shortcut connecting flockfile(3)
> and flock(1), which was entirely unbacked.  You are correct it would
> matter only amongst the threads, not otherwise unsynchronized processes.
> Sorry about the noise :-/
> 
> Shamefully, this rather important (nobody wants garbled log messages)
> aspect is in no way documented in libqb's context (it does not do any
> explicit locking on its own), especially since the length of the logged
> messages can go above the default of 512 B (in the upcoming libqb 2,
> IIUIC) ... and luckily, I was steering the direction to still stay
> modest and cap that on 4 kiB, even if for other reasons:
> 
> https://github.com/ClusterLabs/libqb/pull/292#issuecomment-361745575
> 
> which still might be within Linux + typical FSs (ext4) boundaries
> to guarantee atomicity of an append (or maybe not even that, it all
> seems a gray area of any guarantees provided by the underlying system,
> inputs from the experts welcome).  Anyway, ISTM that we should at the
> very least increase the buffer size for block buffering to the
> configured one (up to 4 kiB as mentioned) if BUFSIZ would be less,
> to prevent tainting this expected atomicity from the get-go.

This post seems to indicate that while equivalent of BUFSIZ is 8 kiB
with glibc (confirmed on Fedora/x86-64), it might possibly be 1 kiB
only on BSDs (unless the underlying FS provides a hint otherwise?),
so the opt-in maxed out message (of 4 kiB currently, but generic
guards are in order in case anybody decides to bump that even further)
might readily cause log corruption on some systems with multiple
processes appending to the same file:

https://github.com/the-tcpdump-group/libpcap/issues/792

Any especially BSD people to advise here about what "atomic append"
on their system means, which conditions need to be assurably met?

-- 
Jan (Poki)


pgphjyR5U2bib.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] Using ClusterLabs logo

2019-04-29 Thread Jan Pokorný
On 29/04/19 16:31 +0200, Kristoffer Grönlund wrote:
> Tomas Jelinek  writes:
>> Is it OK to use ClusterLabs logo as a favicon for pcs in upstream? If 
>> so, are there any conditions to meet?
> 
> Yes, this would be OK to me at least (as the creator of the logo)!
> 
>> 
>> I went through new logo threads in mailinglists but I didn't find 
>> anything specific other than this:
> 
> I don't remember the specific license we decided on back then, but at
> least to me, CC-BY would make sense, where a link to clusterlabs.org
> would be sufficient attribution I think.
> 
> https://creativecommons.org/licenses/by/4.0/

Also a technical note:

Would it then be possible for you to go through *.svg files
you authored in https://github.com/ClusterLabs/clusterlabs-www.git
and add the respective licenses there?

Should be as easy as:
- open with inkscape
- Shift+Ctrl+D (File -> Document Properties)
- select the respective license (or by URI),
  perhaps edit some more metadata
- save again

Seems more appropriate for the author to do this himself if it's
indeed his intention :-)

Thanks!

-- 
Jan (Poki)


pgpYbhEVFLb0u.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] [pacemaker] downstream packagers&direct consumers: is bison prereq OK with you?

2019-04-26 Thread Jan Pokorný
It seems extraneous to carry results of *.y files processing the
tree (and hence in what we call distribution tarballs at the time).
Hence the simple question, are you OK with bison (not yacc, even
though the compatibility fix appears to be a sed oneliner) becoming
a new dependency?

It's also not clear, provided that's fine with you, whether there
should still be some wiggle space, making the feature that would
require that optional, even though I'd prefer the strict uniformity
here for documentation purposes etc. (unless the platform at hand
is not catching up, something like < 3% of deployments, perhaps).

At this point very tentative context for the curious
(currently carries both *.y and respective *.c, but I don't like
the diffstats at all :-)

  https://github.com/ClusterLabs/pacemaker/pull/1756

Please, speak up.

-- 
Jan (Poki)


pgpXgzeQ3f93Z.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] FYI: github policy change potentially affecting ssh/app access to repositories

2019-04-26 Thread Jan Pokorný
On 25/04/19 22:41 +0200, Jan Pokorný wrote:
> On 25/04/19 11:27 -0500, Ken Gaillot wrote:
>> FYI OAuth access restrictions are now in place on the ClusterLabs
>> organization account.
>> 
>> [...]
>> 
>> If you use an app that needs repo access, I believe a request to allow
>> it will be sent automatically, but if problems arise just mention them
>> here or to me directly.
> 
> Looks like Travis CI integration is also affected, at least in case of
> pacemaker:
> 
> https://github.com/ClusterLabs/pacemaker/pull/1759#issuecomment-486817936

Confirming it works now, apparently thanks to some more intervention
by Ken.

>> On Wed, 2019-04-10 at 17:44 -0500, Ken Gaillot wrote:
>>> Florian Haas and Kristoffer Grönlund noticed that the ClusterLabs
>>> organization on github currently carries over any app access that
>>> members have given to their own accounts.
>>> 
>>> This is not significant at the moment since we don't have any private
>>> repositories and few accounts have write access, but to stay on the
>>> safe side, we'd like to enable OAuth access restrictions on the
>>> organization account.
>>> 
>>> Going forward, this will simply mean that any apps that need access
>>> will need to be approved individually by one of the administrators.
>>> 
>>> But as a side effect, this will invalidate existing apps' access as
>>> well as some individual contributors' ssh key access to the
>>> repositories. If you are affected, you can simply re-upload your ssh
>>> key and it will work again.

-- 
Jan (Poki)


pgpPqkT_y304_.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] FYI: github policy change potentially affecting ssh/app access to repositories

2019-04-25 Thread Jan Pokorný
On 25/04/19 11:27 -0500, Ken Gaillot wrote:
> FYI OAuth access restrictions are now in place on the ClusterLabs
> organization account.
> 
> [...]
> 
> If you use an app that needs repo access, I believe a request to allow
> it will be sent automatically, but if problems arise just mention them
> here or to me directly.

Looks like Travis CI integration is also affected, at least in case of
pacemaker:

https://github.com/ClusterLabs/pacemaker/pull/1759#issuecomment-486817936

> On Wed, 2019-04-10 at 17:44 -0500, Ken Gaillot wrote:
>> Florian Haas and Kristoffer Grönlund noticed that the ClusterLabs
>> organization on github currently carries over any app access that
>> members have given to their own accounts.
>> 
>> This is not significant at the moment since we don't have any private
>> repositories and few accounts have write access, but to stay on the
>> safe side, we'd like to enable OAuth access restrictions on the
>> organization account.
>> 
>> Going forward, this will simply mean that any apps that need access
>> will need to be approved individually by one of the administrators.
>> 
>> But as a side effect, this will invalidate existing apps' access as
>> well as some individual contributors' ssh key access to the
>> repositories. If you are affected, you can simply re-upload your ssh
>> key and it will work again.

-- 
Jan (Poki)


pgpLKdDA6xzZV.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] [ClusterLabs] Coming in 2.0.2: check whether a date-based rule is expired

2019-04-23 Thread Jan Pokorný
On 16/04/19 12:38 -0500, Ken Gaillot wrote:
> We are adding a "crm_rule" command

Wouldn't `pcmk-rule` be a more sensible command name -- I mean, why not
to benefit from not suffering the historical burden in this case, given
that `crm` in the broadest "almost anything that can be associated with
our cluster SW" sense is an anachronism, whereas the term metamorphed
into the invoking name of the original management shell project
(heck, we don't have `crmd` as daemon name anymore)?

> that has the ability to check > whether a particular date-based rule is
> currently in effect.
> 
> The motivation is a perennial user complaint: expired constraints
> remain in the configuration, which can be confusing.
> 
> [...]
> 
> The new command gives users (and high-level tools) a way to determine
> whether a rule is in effect, so they can remove it themselves, whether
> manually or in an automated way such as a cron.
> 
> You can use it like:
> 
> crm_rule -r  [-d ] [-X ]
> 
> With just -r, it will tell you whether the specified rule from the
> configuration is currently in effect. If you give -d, it will check as
> of that date and time (ISO 8601 format).

Uh, the date-time data representations (encodings of the singular
information) shall be used with some sort of considerations towards the
use cases:

1. _data-exchange friendly_, point-of-use-context-agnostic
   (yet timezone-respecting if need be) representation
   - this is something you want to have serialized in data
 to outlive the code (extrapolated: for exchange between
 various revisions of the same code)
   - ISO 8601 fills the bill

2. _user-friendly_, point-of-use-context-respecting representation
   - this is something you want user to work with, be it the
 management tools or helpers like crm_rule
   - ISO 8601 _barely_ fills the bill, fails in basic attampts of
 integration with surrounding system:

$ CIB_file=cts/scheduler/date-1.xml ./tools/crm_rule -c \
-r rule.auto-2 -d "next Monday 12:00"
> (crm_abort)   error: crm_time_check: Triggered assert at iso8601.c:1116 : 
> dt->days > 0
> (crm_abort)   error: parse_date: Triggered assert at iso8601.c:757 : 
> crm_time_check(dt)

 no good, let's try old good coreutils' `date` as the "chewer"

$ CIB_file=cts/scheduler/date-1.xml ./tools/crm_rule -c \
-r rule.auto-2 -d "$(date -d "next Monday 12:00")"
> (crm_abort)   error: crm_time_check: Triggered assert at iso8601.c:1116 : 
> dt->days > 0
> (crm_abort)   error: parse_date: Triggered assert at iso8601.c:757 : 
> crm_time_check(dt)

 still no good, so after few more iterations:

$ CIB_file=cts/scheduler/date-1.xml ./tools/crm_rule -c \
-r rule.auto-2 -d "$(date -Iminutes -d "next Monday 12:00")"
> Rule rule.auto-2 is still in effect

 that could be much more intuitive + locale-driven (assuming users
 have the locales set per what's natural to them/what they are
 used to), couldn't it?

I mean, at least allowing `-d` switch in `crm_rule` to support
LANG-native date/time specification makes a lot of sense to me:
https://github.com/ClusterLabs/pacemaker/pull/1756

Perhaps iso8601 (and more?) would deserve the same, even though it
smells with dragging some compatibility/interoperatibility into the
game?  (at least, `crm_rule` is at brand new, moreover marked
experimental, anyway, let's discuss this part at developers ML
if need be -- one more thing to possible put on debate, actually,
this user interface sanitization could be performed merely in the
opaque management shell wrappings, but if nothing else, it amounts
to duplication of work and makes bare-bones use bit of a PITA).

-- 
Jan (Poki)


pgpZVFPL9_hBx.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] FYI: github policy change potentially affecting ssh/app access to repositories

2019-04-15 Thread Jan Pokorný
On 14/04/19 22:48 +0200, Valentin Vidic wrote:
> On Wed, Apr 10, 2019 at 05:44:45PM -0500, Ken Gaillot wrote:
>> Florian Haas and Kristoffer Grönlund noticed that the ClusterLabs
>> organization on github currently carries over any app access that
>> members have given to their own accounts.
> 
> Related to github setup, I just noticed that some ClusterLabs repos
> don't have Issues tab enabled, but I suppose this was intentional?

I think that's very intentional, for several reasons:

* proliferation of issue trackers to watch for a single component is
  just a distraction (also for all those nice reporters that care
  about not filing duplicates), some started before GitHub (or as
  a replacement of a prior non-GH tracking), and will likely stay
  ever after (which is good, see below);
  also, speaking for clufter, I chose to use pagure.io as a primary
  home, so only kept issue tracking enabled there, which makes
  a tonne of sense (aligned with the above concern)

* as we know in HA, it's no good to put all the eggs in one basket
  (a.k.a. SPOF avoidance) -- git is trivial to move around since
  it's distributed by nature, and is continuously mirrored by many
  individuals, so the outliers (important points mentioned just as
  PR commentarie etc.) shall preferably be as spare as possible[1];
  the tracked issues themselves would not be that easy to recover
  back if GitHub stopped working this minute (however unexpectedly),
  I'd actually suggest that ClusterLabs projects with issue tracking
  enabled would opt-in to communication collection to some extra
  read-only mailing list that'd be established for that archival
  purpose (dev-pulse@cl.o?  technically, likely another GH account
  with casual communication email address set to that of this list,
  and subscribed to all these projects; note that bugs.clusterlabs.org
  Bugzilla instance could also forward there) and possibly also
  mirrored with some 3rd party services (too bad that Gmane passed out)

* partly related to that is the flexibility wrt. which forge to choose
  as authoritative, but I believe the data migration freedom is quite
  reasonable here, so there's no data lock-in per se
  (still a proponent of switching to GitLab, recent lengthy PR
  at GH demonstrated how unscalable these ongoing iteration within
  single PR are there)

[1] see point 1. at
https://lists.clusterlabs.org/pipermail/developers/2018-January/001958.html

-- 
Jan (Poki)


pgpaMYgM9k0Ox.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] Karma needed - Re: Updated kronosnet Fedora / EPEL packages to v1.8

2019-04-11 Thread Jan Pokorný
Hello Digimer,

On 11/04/19 01:09 -0400, digimer wrote:
> Would anyone with time and inclination please review / vote for
> these packages? Would like to get them pushed out if possible, short
> a vote each.

FYI, you will be allowed to push to stable in 7 days at latest since
filing the update regardless of karma.

Shouldn't this target rather the users list, preferably with
"[FEDORA/EPEL]" tag to save the time of distro-unaffiliated
users?

-- 
Jan (Poki)


pgpge5VSKJmf_.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs Developers] strange migration-threshold overflow, and fail-count update aborting it's own recovery transition

2019-04-05 Thread Jan Pokorný
On 05/04/19 17:19 +0200, Lars Ellenberg wrote:
> On Fri, Apr 05, 2019 at 09:56:51AM -0500, Ken Gaillot wrote:
>> On Fri, 2019-04-05 at 09:44 -0500, Ken Gaillot wrote:
>>> On Fri, 2019-04-05 at 15:50 +0200, Lars Ellenberg wrote:
 But in this case, someone tried to be smart
 and set a migration-threshold of "very large",
 in this case the string in xml was: , 
 and that probably is "parsed" into some negative value,
>>> 
>>> Anything above "INFINITY" (actually 1,000,000) should be mapped to
>>> INFINITY. If that's not what happens, there's a bug. Running
>>> crm_simulate in verbose mode should be helpful.
> 
> I think I found it already.
> 
> char2score() does crm_parse_int(),
> and reasonably assumes that the result is the parsed int.
> Which it is not, if the result is -1, and errno is set to EINVAL or
> ERANGE ;-)
> 
>char2score -> crm_parse_int 
>""  -> result of strtoll is > INT_MAX,
>result -1, errno ERANGE
>migration_threshold = -1;
> 
> Not sure what to do there, though.
> Yet an other helper,
> mapping ERANGE to appropriate MIN/MAX for the conversion?
> 
> But any "sane" configuration would not even trigger that.

Exactly, but the configuration data model is sinfully underspecified,
although it's not the only problem there.

> Where and how would we point out the "in-sane-ness" to the user,
> though?

I think the correct answer is version 4 of the CIB schema with
proper data-typing.  Version 3 was just an upgrade trigger + gating.

 which means the fail-count=1 now results in "forcing away ...",
 different resource placements,
 and the file system placement elsewhere now results in much more
 actions, demoting/role changes/movement of other dependent
 resources
 ...
 
 
 So I think we have two issues here:
 
 [...]
 
 b) migration-theshold (and possibly other scores) should be
 properly parsed/converted/capped/scaled/rejected
>>> 
>>> That should already be happening

See the schema-based enforcement that's currently missing though shall
be present to avoid the problems as early as possible.

-- 
Nazdar,
Jan (Poki)


pgpfN2o8BJej7.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs Developers] Easy opt-in copyright delegation assignment (Was: Feedback wanted: proposed new copyright policy for Pacemaker)

2019-03-11 Thread Jan Pokorný
On 11/03/19 13:49 -0500, Ken Gaillot wrote:
> There's a pull request for the new policy in case anyone is interested:
> 
> https://github.com/ClusterLabs/pacemaker/pull/1716

As I mentioned there, rhis could be a possible next evolution step, but
it's in no hurry (unlike the former one of reality reflection, perhaps):

> In an outlook, I'd like to also see some simplification regarding the
> opt-in desire to assign the respective portional copyright of the
> changesets to come to a designated other party, typically an employer,
> as a pragmatic (and voluntary loyalty) legalese measure.
> 
> What was devised in a private discussion with Ken was adding an
> AFFILIATION.md file to the tree root, and mapping there
> (with enumeration or wildcards) the well-known "Signed-off-by"
> line email addresses to the respective recipient entity
> plus the start date it comes to effect for the particular item.
> Then, the pacemaker project would gain a clear semantics for
> Signed-off-by lines, and this copyright delegation would be
> trivial once established in AFFILIATION.md.

Also have more concrete wording for the projected AFFILIATION.md
file header to run by you, but that might be premature if there's
some early criticism about this idea as such (any other idea that
would still simplify the objective at hand?).

Also, is there possibly some collective wildcard-matching catch-all
consensus about this for particular companies with the active
involvement in the project, either due to policy or based simply
on an unisono agreement?

For instance, it seems that RH is relaxed about this topic, though
for myself as an employee tasked with this on-behalf-of-work, I'd
like to express my intention towards the company explicitly anyway
(as I'd normally do with new files I'd start on the project, at
least prior to proposed unification), for being rather pragmatic.

Feedback wanted on this as well, thanks in advance.

-- 
Jan (Poki)


pgpAjbgzi9hub.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] libqb: Re: 633f262 logging: Remove linker 'magic' and just use statics for logging callsites (#322)

2019-02-27 Thread Jan Pokorný
Late to the party (for some rather personal reasons), but
anyway, I don't see any progress while there's a pressing need
to resolve at least a single thing for sure before the release,
so here I go...

On 18/01/19 18:53 +0100, Lars Ellenberg wrote:
> On Thu, Jan 17, 2019 at 09:09:11AM +1100, Andrew Beekhof wrote:
>>> On 17 Jan 2019, at 2:59 am, Ken Gaillot  wrote:
>>> I'm not familiar with the reasoning for the current setup, but
>>> pacemaker's crm_crit(), crm_error(), etc. use qb_logt(), while
>>> crm_debug() and crm_trace() (which won't be used in ordinary runs) do
>>> something similar to what you propose.
>>> 
>>> Pacemaker has about 1,700 logging calls that would be affected
>>> (not counting another 2,000 debug/trace). Presumably that means
>>> Pacemaker currently has about +16KB of memory overhead and
>>> binary size for debug/trace logging static pointers, and that
>>> would almost double using them for all logs. Not a big deal
>>> today? Or meaningful in an embedded context?
>>> 
>>> Not sure if that overhead vs runtime trade-off is the original
>>> motivation or not, but that's the first thing that comes to mind.
>> 
>> I believe my interest was the ability to turn them on dynamically
>> in a running program (yes, i used it plenty back in the day) and
>> have the overhead be minimal for the normal case when they weren't
>> in use.

That's what the run-time configuration of the filtering per log
target (or per tags, even) is for, and generally, what the tracing
library should allow one to do naturally, isn't it?

Was there an enormous impact in the "normal case" as you put it,
it'd be a bug/misfeature, asking for new native approaches.

> Also, with libqb before the commit mentioned in the subject
> (633f262) and that is what pacemaker is using right now, you'd get
> one huge static array of "struct callsites" (in a special linker
> section; that's the linker magic that patch removes).

Yes, heap with all the run-time book-keeping overhead vs. cold data
used to be one of the benefits.

> Note: the whole struct was statically allocated,
> it is an array of structs, not just an array of pointers.
> 
> sizeof(struct qb_log_callsite) is 40
> 
> Now, those structs get dynamically allocated,
> and put in some lineno based lookup hash.

(Making it, in degenerate case, a linear (complexity) search,
vs. constant-time with the callsite section.)

> (so already at least additional 16 bytes),
> not counting malloc overhead for all the tiny objects.
> 
> The additional 8 byte static pointer
> is certainly not "doubling" that overhead.
> 
> But can be used to skip the additional lookup,
> sprintf, memcpy and whatnot, and even the function call,
> if the callsite at hand is currently disabled,
> which is probably the case for most >= trace
> callsites most of the time.
> 
> Any volunteers to benchmark the cpu usage?
> I think we'd need
> (trace logging: {enabled, disabled})
> x ({before 633f262,
> after 633f262,
> after 633f262 + lars patch})

Well, no numbers were presented even to support dropping the
callsite section case.  Otherwise the method could be just
repeated, I guess.

> BTW,
> I think without the "linker magic"
> (static array of structs),
> the _log_filter_apply() becomes a no-op?

Could possibly agree in qb_log_callsites_register() flow
(we have just applied the filters and stuff, haven't we?),
but not in qb_log_filter_ctl2() one.
At least without a closer look (so take it with a grain of salt).

> That's qb_log_filter_ctl2() at runtime.
> It would have to iterate over all the collision lists in all the
> buckets of the dynamically allocated callsites, instead of iterating
> the (now non-existing) static array of callsites.

It's what is does now?  I mean, the only simplification would
be to peel off a callsite section indirection, since only
a single section is now, carried?

> One side-effect of not using a static pointer,
> but *always* doing the lookup (qb_log_calsite_get()) again,
> is that a potentially set _custom_filter_fn() would be called
> and that filter applied to the callsite, at each invocation.
> 
> But I don't think that that is intentional?
> 
> Anyways.
> "just saying" :-)

There are more problems to be solved when switching to the static
pointer regarding "at least some continuity and room for future
optimizations", see the pressing one in the discussion along
("Note that this..."): https://github.com/ClusterLabs/libqb/issues/336

* * *

Thanks for pushing on this front, where rather impulsive changes
without truly caring approach were made, my critical voice
notwithstanding.

-- 
Cheers,
Jan (Poki)


pgpBOkRds9H54.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [pacemaker] Discretion with glib v2.59.0+ recommended

2019-02-11 Thread Jan Pokorný
On 20/01/19 12:44 +0100, Jan Pokorný wrote:
> On 18/01/19 20:32 +0100, Jan Pokorný wrote:
>> It was discovered that this release of glib project changed sligthly
>> some parameters of how distribution of values within  hash tables
>> structures work, undermining pacemaker's hard (alas unfeasible) attempt
>> to turn this data type into fully predictable entity.
>> 
>> Current impact is unknown beside some internal regression test failing
>> due to this, so that, e.g., in the environment variables passed in the
>> notification messages, the order of the active nodes (being a space
>> separarated list) may be appear shuffled in comparison with the long
>> standing (and perhaps making a false impression of determinism)
>> behaviour witnessed with older versions of glib in the game.
> 
> Our immediate response is to, at the very least, make the
> cts-scheduler regression suite (the only localhost one that was
> rendered broken with 52 tests out of 733 failed) skip those tests
> where reliance on the exact order of hash-table-driven items was
> sported, so it won't fail as a whole:
> 
> https://github.com/ClusterLabs/pacemaker/pull/1677/commits/15ace890ef0b987db035ee2d71994e37f7eaff96
> [above edit: updated with the newer version of the patch]

Shout-out to Ken for fixing the immediate fallout (deterministic
output breakages in some cts-scheduler tests, making the above
change superfluous) for the upcoming 2.0.1 release!

>> Variations like these are expected, and you may take it as an
>> opportunity to fix incorrect order-wise (like in the stated case)
>> assumptions.
> 
> [intentionally CC'd developers@, should have done it since beginning]
> 
> At this point, testing with glib v2.59.0+, preferably using 2.0.1-rc3
> due to the release cycle timing, is VERY DESIRED if you are considering
> providing some volunteer capacity to pacemaker project, especially if
> you have your own agents and scripts that rely on the exact (and
> previously likely stable) order of "set data made linear, hence
> artificially ordered", like with OCF_RESKEY_CRM_meta_notify_active_uname
> environment variable in clone notifications (as was already suggested;
> complete list is also unknown at this point, unfortunately, for a lack
> of systemic and precise data items tracking in general).

While some of these if not all are now ordered, I'd call using
"stable ordered list" approach to these variable, as opposed to
"plain unordered set" one, from within agents as continuously
frowned-upon unless explicitly lifted.  For predictable
backward/forward pacemaker+glib version compatibility if
for no other reason.

Ken, do you agree?

(If so, we shall keep that in mind for future documentation tweaks
[possibly including also OCF updates], so no false assumptions won't
be cast for new agent implementations going forward.)

>> More serious troubles stemming from this expectation-reality mismatch
>> regarding said data type cannot be denied at this point, subject of
>> further investigation.  When in doubt, staying with glib up to and
>> including v2.58.2 (said tests are passing with it, though any later
>> v2.58.* may keep working "as always") is likely a good idea for the
>> time being.

It think this still partially holds and only time-proven as fully
settled?  I mean, for anything truly reproducible (as in crm_simulate),
either pacemaker prior to 2.0.1 combined with glib pre- or equal-or-post-
2.59.0 need to be uniformly (reproducers need to follow the original)
combined to get the same results, and with pacemaker 2.0.1+, identical
results (but possibly differing against either of the former combos)
will _likely_ be obtained regardless of particular run-time linked glib
version, but strength of this "likely" will only be established with
future experience, I suppose (but shall universally hold with the same
glib class per stated division, so no change in this already positive
regard).

Just scratched the surface, so gladly be corrected.

-- 
Jan (Poki)


pgpBOgDwmAWdn.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [RFC][pacemaker] Antora as a successor for the current publication platform base on (abandoned?) publican

2019-01-22 Thread Jan Pokorný
On 17/01/19 21:00 +0100, Jan Pokorný wrote:
> For instance, also Fedora project, ironically with the intimately
> strongest inclination towards this project, decided to ditch it in
> favour of Antora:
> 
> https://fedoramagazine.org/fedora-docs-overhaul/

[...]

> My ask is then: how you feel about this possible change (addressing
> intentionallh YOU on this very list, as an existing or possible future
> contributor), if you know of some other tool comparable to publican,
> or if you think we might be served with some other approach to
> mastering publications with as little friction as possible (staying
> with AsciiDoc preferred for the time being) unless we get something
> really appealing in return (is there any cherry like that with, e.g.,
> Sphinx?).

You tossing it here for possible future reference, more detailed
article linked from the Fedora Magazine post mentioned that Fedora
was also toying with what OpenShift documentation uses:

https://github.com/redhataccess/ascii_binder

once they were clear publican is not viable going forward and before
sticking with Antora.  It doesn't look very maintained either, though,
and brings a whole new dependency avalanche with it (Ruby), too.

-- 
Nazdar,
Jan (Poki)


pgpzguzozsAmm.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [pacemaker] Discretion with glib v2.59.0+ recommended

2019-01-20 Thread Jan Pokorný
On 18/01/19 20:32 +0100, Jan Pokorný wrote:
> It was discovered that this release of glib project changed sligthly
> some parameters of how distribution of values within  hash tables
> structures work, undermining pacemaker's hard (alas unfeasible) attempt
> to turn this data type into fully predictable entity.
> 
> Current impact is unknown beside some internal regression test failing
> due to this, so that, e.g., in the environment variables passed in the
> notification messages, the order of the active nodes (being a space
> separarated list) may be appear shuffled in comparison with the long
> standing (and perhaps making a false impression of determinism)
> behaviour witnessed with older versions of glib in the game.

Our immediate response is to, at the very least, make the
cts-scheduler regression suite (the only localhost one that was
rendered broken with 52 tests out of 733 failed) skip those tests
where reliance on the exact order of hash-table-driven items was
sported, so it won't fail as a whole:

https://github.com/ClusterLabs/pacemaker/pull/1677/commits/d76a2614ded697fb4adb117e5a6633008c31f60e

> Variations like these are expected, and you may take it as an
> opportunity to fix incorrect order-wise (like in the stated case)
> assumptions.

[intentionally CC'd developers@, should have done it since beginning]

At this point, testing with glib v2.59.0+, preferably using 2.0.1-rc3
due to the release cycle timing, is VERY DESIRED if you are considering
providing some volunteer capacity to pacemaker project, especially if
you have your own agents and scripts that rely on the exact (and
previously likely stable) order of "set data made linear, hence
artificially ordered", like with OCF_RESKEY_CRM_meta_notify_active_uname
environment variable in clone notifications (as was already suggested;
complete list is also unknown at this point, unfortunately, for a lack
of systemic and precise data items tracking in general).

To do that, spinning a test cluster with the current Fedora Rawhide[*]
(that already ships glib v2.59 since beginning of this year) is
perhaps a most convenient option -- I've just built 2.0.1-rc3 packages
here so they will eventually get to the distribution mirrors, or you
can grab them for your architecture at
https://koji.fedoraproject.org/koji/buildinfo?buildID=1180970
right away.

[*] it shall be possible to point virt-install/respective dialog
in virt-manager to the direct location for Rawhide packages, see
https://fedoraproject.org/wiki/Releases/Rawhide#Point_installer_to_Rawhide

> More serious troubles stemming from this expectation-reality mismatch
> regarding said data type cannot be denied at this point, subject of
> further investigation.  When in doubt, staying with glib up to and
> including v2.58.2 (said tests are passing with it, though any later
> v2.58.* may keep working "as always") is likely a good idea for the
> time being.

-- 
Nazdar,
Jan (Poki)


pgppOkb8goUAX.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [RFC][pacemaker] Antora as a successor for the current publication platform base on (abandoned?) publican

2019-01-17 Thread Jan Pokorný
> Antora looks interesting. The biggest downside vs publican is that it
> appears to be only a static website generator, i.e. it would not
> generate PDF, epub, or single-page HTML the way we do now.

Couple of good questions was, coincidentally, raised "yesterday":
https://gitlab.com/antora/antora/issues/401

Nonethelss, I haven't heard of that project until I checked the
details about the Fedora docs migration I vaguely knew about.
I know see a connection between that project and Asciidoctor (for
which we already introduced small compat changes), though.

-- 
Jan (Poki)


pgpKzr749VsjD.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] [RFC][pacemaker] Antora as a successor for the current publication platform base on (abandoned?) publican

2019-01-17 Thread Jan Pokorný
I am now talking about documents as available, e.g., at:

https://clusterlabs.org/pacemaker/doc/ (Versioned documentation)

Sadly, I've come to realize that publican is no longer being
developed, and while this alone is bearable since it fulfills
its role well, worse, some distros are not (going to be) packaging
it anymore.  Also, think of staying up-to-date with target formats
and "pleasing aesthetics of the decade".

For instance, also Fedora project, ironically with the intimately
strongest inclination towards this project, decided to ditch it in
favour of Antora:

https://fedoramagazine.org/fedora-docs-overhaul/

On the first sight, getting rid of publican looked well -- the less
extensive dependencies (like Perl ecosystem) the better.  But the
crux is that Antora is possibly even worse in this regard :-D
Good thing about Antora, though, is that it natively works with
with AsciiDoc formatted files, just as we already do, e.g.:

https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.1-rc2/doc/Pacemaker_Explained/en-US


My ask is then: how you feel about this possible change (addressing
intentionallh YOU on this very list, as an existing or possible future
contributor), if you know of some other tool comparable to publican,
or if you think we might be served with some other approach to
mastering publications with as little friction as possible (staying
with AsciiDoc preferred for the time being) unless we get something
really appealing in return (is there any cherry like that with, e.g.,
Sphinx?).

I figure also downstream has possibly something to say here
if they are after shipping such handbooks as well.

Thanks for your inputs.

-- 
Jan (Poki)


pgpySpdkA5RCF.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Heads up for potential Pacemaker API change

2018-11-02 Thread Jan Pokorný
On 01/11/18 16:41 -0500, Ken Gaillot wrote:
> I ran into a situation recently where a fix would require changing
> libpe_status's pe_working_set_t data type.
> 
> I ran into a situation recently where a fix would require changing
> libpe_status's pe_working_set_t data type.
> For most data types in the Pacemaker API, we require (usually by
> documented policy rather than code) that library-provided
> constructors be used to allocate them. That allows us to add new
> members at the end of structs without existing applications needing
> to be rebuilt.

Note this is not a panacea unless the struct definition is moved to
private only header and the respective pointers are all what's exposed
in public API.  So currently the client programs can just as well get
broken on future struct expansion (imagine an array of structs).

> A bit of searching turned up only sbd, fence-virt, and pacemaker-mgmt
> using libpe_status (and I'm not sure pacemaker-mgmt is still active).
> But I'm curious if anyone has custom applications that might be
> affected, or has an opinion on the problem and solution here.

With fence-virt (never occurred to me that it ever had a pacemaker
backend!) was inevitably broken since 1.1.8 at latest since the
renaming of "new_ha_date" function:
https://github.com/ClusterLabs/pacemaker/commit/9d2805ab00a117ddf3d1c67e2383c7778a81230f#diff-917f93b8d6f4434bbf21cd5b8240895cL1044
hence I bet it was never actively used (HA cluster of virtual nodes
spread across clustered hypervisors and/or even mixed topologies?
is the idea worth reviving?).

-- 
Nazdar,
Jan (Poki)


pgpZzOgUePCnX.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [openstack-dev] [HA] future of OpenStack OCF resource agents (was: resource-agents v4.2.0)

2018-11-02 Thread Jan Pokorný
On 01/11/18 17:07 -0500, Ken Gaillot wrote:
> FYI there is further discussion happening on the PR:
> 
> https://github.com/ClusterLabs/resource-agents/pull/1147
> 
> I think we have multiple issues we're trying to solve:
> 
> 1. Discoverability in terms of users knowing what agents may be
> available for any given purpose
> 
> 2. Organization of installed agents into categories for display by
> tools (especially GUIs)
> 
> 3. Packaging of agents in a way that avoids dragging in unnecessary
> dependencies
> 
> 4. Then the original problem that the provider field solved, which was
> to allow unrelated organizations to create their own sets of resource
> agents, whether for internal use or public distribution. This has
> become less of a problem since the consolidation of the cluster stack,
> but is still useful, especially for local customizations of stock
> agents.
> 
> #1 and #4 conflict to some extent. I think the whole point of a
> standard (OCF in this case) is to not require a central authority, so
> we should allow for independent providers who may not be closely tied
> into the ClusterLabs community. But we don't want users to be lost in
> a sea of unconnected repos, either.
> 
> I don't see an obvious solution for #1 (discoverability). We could
> document known other repos in the resource-agents README as some have
> suggested, or some other common location such as clusterlabs.org, but
> will users know to look there?

Main problem with discoverability is that with the upstream hat on
(both these are getting intertwined contentiously, though the context
matters, and only the former should be of utmost concern here), you want
to serve every and each cluster stack user, regardless of platform and
its conventions for procurement of the software bits, while making it
immediately actionable without hassles of finding intermediate manual
steps.  While on the other hand, there is a heterogenous set of
packages.

Luckily, no wheel reinventing needed, Zero Install projects looks truly
appealing in this regard: https://0install.net/

- atomic SW pieces/packages are offered by the means of XMLs
  (hence user scripting friendly) that can also be served in
  a web browser where they render as human friendly pages
  thanks to XSLT, see an example:
  http://roscidus.com/0mirror/sites/site-rox.sourceforge.net.html

- feeds can refer to particular platform-native packages
  that will be preferred when possible, and for yet unsupported
  platforms, there could be some good enough generalized recipe
  (download, prepare if needed, pick the suitable files):
  https://0install.net/distribution-integration.html#idm44

- heterogenous package sources are the core, so non-resource-agents
  agents would no longer be ostracized as long as someone cares
  to send submissions with the feeds for the external projects
  (it could easily be in the form of git-repo-backed website,
  so the workflow for such new submissions would be trivial)

Perhaps it would be a good idea to eventually put such feeds together
to be served under clusterlabs.org domain, akin to one-stop shops
("app stores") that everyone is so familiar with these days?

For that to work with enough granularity (of semi-standalone agents),
it's vital to push downstreams to start packaging agents at this
granular level (now, I am looking at resource-agents), so that the
you-only-get-what-you-ask-for use case is possible at all.

With this in place and properly advertised, I believe the problem
of where to stick and maintain particular agents would become
really minor as long as shared dependencies are stabilized.
Also, all agents would be comparably equal in the feeds' listing,
which I think would be a leap towards democratizing the landscape
(e.g., when someone is a dedicated author of some agent to scratch
her itch, while preferring an indepedent, autonomous yet publicized
development).

And indeed, rinse and repeat for any other agents (fence agents,
alert handlers), and in turn, let people combine available kit
components as they like.

> For #2 (organization), some possibilities are:
> 
> - add a category field in the RA meta-data
> - extend the RA naming to include a category, e.g.
> ocf:clusterlabs:networking/IPaddr2
> - repurpose the provider field as a category name
> 
> The first is cleaner and works unchanged with existing tools, but it
> requires any tool that wants to use it to read all agents' meta-data at
> start-up. I'm not sure if that's reasonable or not. The second allows
> more efficient listing (just regular old subdirectories) but may
> require changes to the standard as well existing tools. I'm not fond of
> the third, because it then loses the ability to solve #4. Of course I'm
> open to other possibilities too :-)
> 
> I'm not sure how much a problem #3 (packaging) is. Just because an
> agent manipulates service X doesn't mean it needs to depend on the X
> package;

Again, this is a downstream discussion not very appropriate here,
but for the 

Re: [ClusterLabs Developers] [HA] future of OpenStack OCF resource agents (was: resource-agents v4.2.0)

2018-10-24 Thread Jan Pokorný
On 24/10/18 14:42 +0200, Valentin Vidic wrote:
> On Wed, Oct 24, 2018 at 01:25:54PM +0100, Adam Spiers wrote:
>> No doubt I've missed some pros and cons here.  At this point
>> personally I'm slightly leaning towards keeping them in the
>> openstack-resource-agents - but that's assuming I can either hand off
>> maintainership to someone with more time, or somehow find the time
>> myself to do a better job.
>> 
>> What does everyone else think?  All opinions are very welcome,
>> obviously.
> 
> Well, I can just comment that with all the python agents coming in,
> the resource-agents package is getting a bit heavy on the dependencies
> (at least in Debian) so we might decide to split it at some point in
> the future.

At least packaging-wise, I think it would certainly be helpful to
split the current monolith of resource-agents.  Luckily, streamlined
user experience (catalogue of resources that are readily configurable)
is not necessarily in opposition to deliberate picking of particular
cherries from the basket (weak dependencies, catch-all meta packages
like it exists with fence-agents-all dependencies-only RPM, etc.).
Exactly to avoid the dependency creep.

Sorry for little off-topic, I don't have any opinion on the main
discussed matter.

-- 
Nazdar,
Jan (Poki)


pgp1oVZjqwz72.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [RFC] Time to migrate authoritative source forge elsewhere?

2018-10-23 Thread Jan Pokorný
On 08/06/18 00:21 +0200, Jan Pokorný wrote:
> On 07/06/18 15:40 -0500, Ken Gaillot wrote:
>> On Thu, 2018-06-07 at 11:01 -0400, Digimer wrote:
>>> I think we need to hang tight and wait to see what the landscape
>>> looks like after the dust settles. There are a lot of people on
>>> different projects under the Clusterlabs group. To have them all
>>> move in coordination would NOT be easy. If we do move, we need to
>>> be certain that it's worth the hassle and that we're going to the
>>> right place.
>>> 
>>> I don't think either of those can be met just now. Gitlab has had
>>> some well publicized, major problems in the past. No solution I
>>> know of is totally open, so it's a question of "picking your
>>> poison" which doesn't make a strong "move" argument.
>>> 
>>> I vote to just hang tight, say for 3~6 months, then start a new
>>> thread to discuss further.
>> 
>> +1
>> 
>> I'd wait until the dust settles to see if a clear favorite emerges.
>> Hopefully this will spur the other projects to compete more strongly on
>> features.
>> 
>> My gut feeling is that ClusterLabs may end up self-hosting one or
>> another of the open(ish) projects; our traffic is low enough it
>> shouldn't involve much admin. But as you suggested, I wouldn't look
>> forward to the migration. It's a time sink that means less coding on
>> our projects.
> 
> Hopefully not at all:
> https://docs.gitlab.com/ce/user/project/import/github.html
> 
> Btw. just to prevent any sort of squatting, I've registered
> https://gitlab.com/ClusterLabs & sharing now the intended dedication
> of this namespace publicly in a signed email in case it will turn
> up useful and the bus factor or whatever kicks in.

I guess you could see this thread bump coming, but with the recent
lack of HA with GitHub [1,2] (some rumours guessed that the problem
in question might have something to do with split brain scenarios
-- what a fitting reminder of the consequences, isn't it?), it's
a new opportunity to possibly get the ball slowly rolling and
reconsider where the biggest benefits vs. losses (e.g. suboptimal
merge reviews) possibly are and whether it's not the suitable time
for action now.

[1] https://blog.github.com/2018-10-21-october21-incident-report/
[2] https://blog.github.com/2018-10-22-incident-update/

-- 
Nazdar,
Jan (Poki)


pgpA_6CfSt7sX.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] pacemakerd: error: sysrq_init: Cannot write to /proc/sys/kernel/sysrq: Permission denied (13)

2018-09-19 Thread Jan Pokorný
On 19/09/18 14:47 +0800, zhongbin wrote:
>   More detail:
> my  operating system  is  Debian 8  (jessie) . 
> 
> At 2018-09-19 14:00:42, "钟彬"  wrote:
> 
> When I use a non - root user to start pacemaker-2.0.0,

Running pacemaker as non-root is not a good choice, I am afraid.

It simply wasn't designed to run like that, since the vast majority
of the resources to be managed in HA fashion (purpose of pacemaker)
will require some portion of extra privileges, so the actual
progression regarding privileges is to start with a full sack
only to gradually drop what's not needed (akin to "least privilege"
principle) -- either in pacemaker's own set of auxiliary daemons
or in internally in the resources themselves.

The other justification is that for HA clustering to be meaningful,
you need some kind of isolation of broken hosts, and how much sense
does it make to _not_ allow enough privileges to pacemaker while at
the same time allowing it to cut off these machines incl. self
(which is being attempted in your very case, to solve something
very unexpected -- not having enough privileges is likely one such
case)?

> "pacemakerd:  error: sysrq_init: Cannot write to /proc/sys/kernel/sysrq: 
> Permission denied (13)" 
> appears  in pacemaker.log.
> Some  other "Permission denied"  problems ware resolved by using
> "setcap" command to  enable some capabilities.  But the above
> problem cannot be solved.

Well, your run of pacemaker is getting to a really unsolvable
situation when it takes the code path allowing for such a message,
so even if you manage to overcome that denial with some other
capabilities artificially granted, your machine will likely just
be rebooted.

If I were you, I'd stop going down that rabbit hole and simply
run pacemaker as root.  The workaround chain for your current
approach doesn't seem to be worth the hassle, and is in conflict
with what pacemaker is meant to be used for.

> "Cannot write to /proc/sys/kernel/sysrq" was printed when  calling
> the function  sysrq_init.
> [1]https://github.com/ClusterLabs/pacemaker/blob/e8b96015f5e709de29f8e84fc78387796d31b4da/lib/common/watchdog.c#L69

Not that it should help in your scenario, but realized that perhaps
less writes is better regarding various Linux security modules,
auditing, etc., and any sort of race condition is not imminent
(at worst racing with the sibling processes with the same intent):
https://github.com/ClusterLabs/pacemaker/pull/1590

> Can you give me some suggestions to solve the problem. Is
> sysrq_init  necessary,can I  Ignore the error.

See above, you likely won't get anywhere even if you ignore that
error.

-- 
Nazdar,
Jan (Poki)


pgpRIzlWY9UBV.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] CIB daemon up and running

2018-08-13 Thread Jan Pokorný
On 13/08/18 10:19 -0500, Ken Gaillot wrote:
> On Mon, 2018-08-13 at 05:36 +, Rohit Saini wrote:
>> Gentle Reminder!!
>>  
>> From: Rohit Saini 
>> Sent: 31 July 2018 10:34
>> To: 'developers@clusterlabs.org' 
>> Subject: CIB daemon up and running
>>  
>> Hello,
>>  
>> After “pcs cluster start”, how would I know if my CIB daemon has come
>> up and is initialized properly.
>> Currently I am checking output of “cibadmin -Q” periodically and when
>> I get the output, I consider CIB daemon has come up and initialized.
>>  
>> Is there anything better than this? I am looking for some
>> optimizations with respect to above.
>>  
>>  
>> Thanks,
>> Rohit
> 
> That's probably the best way available currently. You could copy the
> source code of cibadmin and modify it to do the query in a loop until
> successful, if you wanted to make it more convenient.

IOW. the plain poling mentioned in my previous reply
https://lists.clusterlabs.org/pipermail/developers/2018-July/001271.html

Have you missed that, Rohit?  What's your ultimate objective?

-- 
Nazdar,
Jan (Poki)


pgpnaDUTCmFNg.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] CIB daemon up and running

2018-07-31 Thread Jan Pokorný
Hello Rohit,

On 31/07/18 05:03 +, Rohit Saini wrote:
> After "pcs cluster start", how would I know if my CIB daemon has
> come up and is initialized properly.
> Currently I am checking output of "cibadmin -Q" periodically and
> when I get the output, I consider CIB daemon has come up and
> initialized.
> 
> Is there anything better than this? I am looking for some
> optimizations with respect to above.

The natural question here: what's your wider goal here?

Do you want to establish the connection with CIB (the daemon
got renamed to pacemaker-based since 2.0) as soon as it's possible
as an readiness indication for your scripting/application on top
of pacemaker?  I actually suspect we are back in automation waters
(Ansible?)...  Then, users list might be actually more suitable
venue to discuss this (CC'd).

The client/server arrangement of local inter-process communication
won't hardly allow for anything better than polling at this time,
with an exception being a slight possibly of using inotify to hook
at when /dev/shm/qb-cib_* file gets created.  That would, however,
rely on some kind of an implementation detail, which is indeed
discouraged, as it's generally a moving target.

If there's a demand, we could possibly add a DBus interface and
emit signals about events like these -- that might also alleviate
busy waiting/polling (or unreliable guarantees) in crm/pcs when
it comes to cases like this as well, I guess.  Or is there any
better idea how to move towards fully event-driven system?

-- 
Nazdar,
Jan (Poki)


pgpmNYkO5tQH3.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] [questionnaire] Do you overload pacemaker's meta-attributes to track your own data?

2018-06-28 Thread Jan Pokorný
Hello, and since it is a month since the preceding attempt to gather
some feedback, welcome to yet another simple set of questions that
I will be glad to have answered by as many of you as possible,
as an auxiliary indicator what's generally acceptable and what's not
within the userbase.

This time, I need to introduce context of the questions, since that's
important, and I am sorry it's rather long (feel free to skip lower
to the same original indentation level if you are in a time press):

  As you've surely heard when in touch with pacemaker, there's
  a level of declarative annotations for resources (whether primitive
  or otherwise), their operations and few other entities.  You'll
  find which ones (which identifiers in variable assignments emulated
  with identifier + value pairs) can be effectively applied in which
  context in the documentation[1] -- these are comprehended with
  pacemaker and put into resource allocation equations.

  Perhaps less known is the fact that these sets are open to possibly
  foreign, user-defined assignments that may effectively overload the
  the primary role of meta-attributes, dragging user-defined semantics
  there.  There may be warnings about doing so at the high-level
  management tools, but pacemaker won't protest by design, as this
  is also what allows for smooth configuration reuse with various
  point releases possibly acquiring new meanings for new identifiers.

  This possibility of a free-form consumer extensibility doesn't appear
  to be advertised anywhere (perhaps to prevent people confusing CIB,
  the configuration hierarchy, with generic key-value store, which it
  is rather not), and within the pacemaker configuration realms, it
  wasn't useful until it started to be an optional point of interest
  in location constraints thanks to ability to refer meta-attributes
  in the respective rules based on "value-source" indirection[2],
  which arrived with pacemaker 1.1.17.

  More experienced users/developers (intentionally sent to both lists)
  may already start suspecting potential namespace collisions between
  a narrow but possibly growing set identifiers claimed by pacemaker
  for its own (and here original) purpose, and those that are added
  by users, either so as to pose in the mentioned constraint rules
  or for some other, possibly external automation related purpose.

  So, I've figured out that with upcoming 2.0 release, we have a nice
  opportunity to start doing something about that, and the least
  effort, fully backward + tooling compatible, that would start
  getting us to a conflict-less situation is, in my opinion, to start
  actively pushing for a lexical cut, asking for a special
  prefix/naming convention for the mentioned custom additions.
  
  This initiative is meant to consist of two steps:
  
  a. modify the documentation to expressly detail said lexical
 requirement
 - you can read draft of my change as a pull request for pacemaker:
   https://github.com/ClusterLabs/pacemaker/pull/1523/files
   (warning: the respective discussion was somewhat heated,
   and is not a subject of examination nor of a special interest
   here), basically I suggest "x-*" naming, with full recommended
   convention being "x-appname_identifier"
  
  b. add a warning to the logs/standard error output (daemons/CLI)
 when not recognized as pacemaker's claimed identifier nor
 starting with dedicated prefix(es), possibly referring to
 the documentation stanza per a., in a similar way the user
 gets notified that no fencing devices were configured
 - this would need to be coded
 - note that this way, you would get actually warned about
   your own typos in the meta-attribute identifiers even
   if you are not using any high-level tooling

  This may be the final status quo, or the eventual separation
  of the identifiers makes it really easy to perform other schema
  upgrade related steps with future major schema version bumps
  _safely_.  Nobody is immediately forced to anything, although
  the above points should make it clear it's prudent to get ready
  (e.g. also regarding the custom tooling around that) in respect
  to future major pacemaker/schema version bumps and respective
  auto-upgrades of the configuration (say it will be declared
  it's valid to upgrade to pacemaker 3.0 only from as old pacemaker
  as 2.0 -- that's the justification for acting _now_ with preparing
  sane grounds slowly).

* * *

So now the promised questions; just send a reply where you [x] tick
your selections for the questions below, possibly with some more
commentary on the topic, and preferrably on-list (single of your
choice is enough):

1. In your cluster configurations, do you carry meta-attributes
   other than those recognized by pacemaker?

   [ ] no

   [ ] yes (if so, can you specify whether for said constraints
rules, as a way to permanently attach some kind of
administrative piec

Re: [ClusterLabs Developers] [RFC] Time to migrate authoritative source forge elsewhere?

2018-06-08 Thread Jan Pokorný
On 07/06/18 11:10 +, Nils Carlson wrote:
> On 2018-06-07 08:58, Kristoffer Grönlund wrote:
>> Jan Pokorný  writes:
>>> AFAIK this doesn't address the qualitative complaint I have.  It makes
>>> for a very poor experience when there's no readily available way to
>>> observe evolution of particular patchsets, only to waste time of the
>>> reviewer or contribute to oversights ("I'll skip this part I am sure
>>> I reviewed already, if there was a generational diff, I'd have a look,
>>> but the review is quite a pain already, I'll move on").
>>> No, setting up a bot to gradually capture work in progress is not
>>> a solution.  And pull-request-per-patchset-iteration sounds crazy
>>> considering this count sometimes goes pretty high.
>>> 
>> 
>> I'll confess that I have no experience with Gerrit or the Github
>> required reviews, and I don't really know how they differ. :)
> 
> 
> Adding some info as these are things I know something about.
> 
> Gitlab & Github are very similar, but I much prefer Gitlab after having used
> both.
> 
> For open-source projects Gitlab gives you all features, including things
> like "approvers" for merge-requests. They have a nice permission model which
> allows only some users to approve merge requests and to set a minimum number
> of approvers.
> 
> The fundamental unit of review in Gitlab is the merge-request, requesting
> that a branch be merged into another. This works very well in practice. You
> can configure a regex for branch names and only allow users to push to
> branches with a prefix like "contributions/", making all other branches
> "protected", i.e. prevent direct pushes.
> 
> The code-review is good, but could be better. Every time you update the
> branch (either amending a commit or pushing a new commit) this creates a new
> "version" of the merge-request that you can diff against previous versions.

I must admit, this would be a killer feature for me (see the above
rant) and best trade-off if the willingness to try/adopt Gerrit
is unlikely.

> The bad thing here is that comments are not always carried over as they
> should be. There is also no way of marking a file as reviewed, so large
> reviews can be cumbersome. The good news is that this stuff is improving
> slowly.
> 
> Gerrit is a much more powerful tool for code-review. The workflow is less
> intuitive however and has a far higher learning curve. It requires specific
> hooks to be installed to work well and works by a "patch-set" concept. You
> push your changes to a "for" branch, i.e. "for-master" and they then end up
> on an unnamed branch on the server in a review. From there they can be
> pulled and tested.
> 
> The code-review is top-notch, with comments attached to a version of the
> patch-set and intra-version diffs being quick and elegant.
> 
> The negative sides of Gerrit typically outweigh the positive for most
> organizations I'm afraid:
> 
> - No central hosting like gitlab.com.
> - High threshold for new contributors (unusual workflow, hooks needed. )
> - No bugs/issues etc. But good jira integration.
> 
> I haven't tried pagure. There is also gitea which looks promising. And
> bitbucket.

Thanks for sharing your thoughts, Nils, appreciated.

P.S. Your post may be stuck in the moderation queue, hopefully this
is resolved soon (as a rule of thumb, I recommend subcribing to
particular list first if not already, but there can be additional
anti-spam measures for first-time/unrecognized posters).

-- 
Poki


pgpJ29F73cJD9.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [RFC] Time to migrate authoritative source forge elsewhere?

2018-06-07 Thread Jan Pokorný
On 07/06/18 15:40 -0500, Ken Gaillot wrote:
> On Thu, 2018-06-07 at 11:01 -0400, Digimer wrote:
>> I think we need to hang tight and wait to see what the landscape
>> looks like after the dust settles. There are a lot of people on
>> different projects under the Clusterlabs group. To have them all
>> move in coordination would NOT be easy. If we do move, we need to
>> be certain that it's worth the hassle and that we're going to the
>> right place.
>> 
>> I don't think either of those can be met just now. Gitlab has had
>> some well publicized, major problems in the past. No solution I
>> know of is totally open, so it's a question of "picking your
>> poison" which doesn't make a strong "move" argument.
>> 
>> I vote to just hang tight, say for 3~6 months, then start a new
>> thread to discuss further.
> 
> +1
> 
> I'd wait until the dust settles to see if a clear favorite emerges.
> Hopefully this will spur the other projects to compete more strongly on
> features.
> 
> My gut feeling is that ClusterLabs may end up self-hosting one or
> another of the open(ish) projects; our traffic is low enough it
> shouldn't involve much admin. But as you suggested, I wouldn't look
> forward to the migration. It's a time sink that means less coding on
> our projects.

Hopefully not at all:
https://docs.gitlab.com/ce/user/project/import/github.html

Btw. just to prevent any sort of squatting, I've registered
https://gitlab.com/ClusterLabs & sharing now the intended dedication
of this namespace publicly in a signed email in case it will turn
up useful and the bus factor or whatever kicks in.

-- 
Poki


pgp8i6_y5gKho.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [RFC] Time to migrate authoritative source forge elsewhere?

2018-06-07 Thread Jan Pokorný
On 04/06/18 09:23 +0200, Jan Pokorný wrote:
> As a second step, it might also be wise to start offering release
> tarballs elsewhere, preferrably OpenPGP-signed proper releases
> (as in "make dist" or the like) -- then it can be served practically
> from whatever location without imminent risk of being tampered with.

Meanwhile in Gitea land (another alternative for self-hosting):
https://github.com/go-gitea/gitea/issues/4167

Practical demonstration why to sign releases (tags, commits...), and
why permissions aspect of mixing proprietary and self-managed services
sucks.

-- 
Poki


pgpI_015ZHwI5.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [RFC] Time to migrate authoritative source forge elsewhere?

2018-06-07 Thread Jan Pokorný
On 07/06/18 08:48 +0200, Kristoffer Grönlund wrote:
> Jan Pokorný  writes:
>> But with the latest headlines on where that site is likely headed,
>> I think it's a great opportunity for us to possibly jump on the
>> bandwagon inclined more towards free (as in freedom) software
>> principles.
>> 
>> Possible options off the top of my head:
>> - GitLab, pagure: either their authoritative sites or self-hosted
>> - self-hosted cgit/whatever
>> 
>> It would also allow us to reconsider our workflows, e.g. using gerrit
>> for patch review queue (current silent force-pushes is a horrible
>> scheme!).
>> 
> My general view is that I also feel (and have felt) a bit uneasy about
> free software projects depending so strongly on a proprietary
> service. However, unless self-hosting, I don't see how f.ex. GitLab is
> much of an improvement

Open-core business approach aside as perhaps necessary downside at
these scales, the difference is crucial: Community Edition is open
source, anyone can host it individually, which is what enabled
both Debian and GNOME to consider it's usage (became a reality
for the latter: https://gitlab.gnome.org/explore/groups,
https://www.gnome.org/news/2018/05/gnome-moves-to-gitlab-2/)

Feature-wise:
https://wiki.debian.org/Alioth/GitNext/GitLab
https://wiki.debian.org/Alioth/GitNext
https://wiki.gnome.org/Initiatives/DevelopmentInfrastructure/FeatureMatrix

> (Pagure might be a different story, but does it offer a comparable
> user experience?) in that regard, and anything hosted on "public"
> cloud is basically the same. ;)

Pagure has the benefit you can influence it relatively easily, as
I directly attested :-)

> crmsh used to be hosted at GNU Savannah, which is Free with a capital F,
> but the admin experience, user experience and general discoverability in
> the world at large all left something to be desired.
> 
> In regard to workflows, if everyone agrees, we should be able to improve
> that without moving. For example, if all changes went through pull
> requests, there is a "required reviews" feature in github. I don't know
> if that is something everyone want, though.
> 
> https://help.github.com/articles/enabling-required-reviews-for-pull-requests/

AFAIK this doesn't address the qualitative complaint I have.  It makes
for a very poor experience when there's no readily available way to
observe evolution of particular patchsets, only to waste time of the
reviewer or contribute to oversights ("I'll skip this part I am sure
I reviewed already, if there was a generational diff, I'd have a look,
but the review is quite a pain already, I'll move on").
No, setting up a bot to gradually capture work in progress is not
a solution.  And pull-request-per-patchset-iteration sounds crazy
considering this count sometimes goes pretty high.


In the short term, I'd suggest concentrating on the two points I raised:
- good discipline regarding commit messages
- more systemic approach to release tarballs if possible

-- 
Poki


pgpY3msKcTLPo.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] [RFC] Time to migrate authoritative source forge elsewhere?

2018-06-04 Thread Jan Pokorný
Good Monday morning,

almost half a year ago, when I was writing the lines below in
a response to a tangential topic, I wouldn't have believed we are
going to be so close towards reconsidering the stay on GitHub (GH),
said proprietaty service [1]:

On 09/01/18 15:37 +, Adam Spiers wrote:
> Jan Pokorný  wrote:
>>come on people, when the code base is to stand the test of time,
>>is it more likely that the context survives in the proprietary
>>free-of-charge service [as a comment at the pull request] without
>>massive replication, or in the bits being indivisible part of the
>>distributed repo?
> 
> +100.  This is mentioned here too:
> 
> https://wiki.openstack.org/wiki/GitCommitMessages#Information_in_commit_messages

But with the latest headlines on where that site is likely headed,
I think it's a great opportunity for us to possibly jump on the
bandwagon inclined more towards free (as in freedom) software
principles.

Possible options off the top of my head:
- GitLab, pagure: either their authoritative sites or self-hosted
- self-hosted cgit/whatever

It would also allow us to reconsider our workflows, e.g. using gerrit
for patch review queue (current silent force-pushes is a horrible
scheme!).

GitHub could stay as a mirror-only location, referring to the proper
location where all the activities take place.

Anyway, as a first step, pretty please, do not leave anything of an
information value as mere comment on the pull request, put it where it
belongs -- as mentioned above -- in the commit message proper.
Also referring to the project issue tracker at GH (e.g. for libqb)
should not be at the expense of omitting important points, again,
in the commit message.

As a second step, it might also be wise to start offering release
tarballs elsewhere, preferrably OpenPGP-signed proper releases
(as in "make dist" or the like) -- then it can be served practically
from whatever location without imminent risk of being tampered with.

Ideas/comments/feedback?


[1] https://lists.clusterlabs.org/pipermail/developers/2018-January/001183.html

-- 
Jan (Poki)


pgpYrv0VVrwFb.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] heads-up: procps-ng (notably "ps" tool) as a DoS vector

2018-05-21 Thread Jan Pokorný
Hello (intentionally primarily) cluster stack development forces,

I came around http://www.openwall.com/lists/oss-security/2018/05/17/1
that seems to indicate it is fairly trivial for an unprivileged user
on an unpatched and non-hardened Linux system using procps-ng as it's
primary package for process listing utilities to block "ps" invocation,
amongst other nasty findings.

I hope it's quite needless to state why this is of importance for
HA cluster users -- stock resource agents often rely on process
listing facilitated with said tool.  Fortunately, query-by-PID
looks affected only infinitesimally.  Some agents, however, do
exercise full breadth search with later filtering per occurrences
of some string patterns, which is, to some extent, a broken approach
even without that security advisory in the picture (see also [1],
all basically amounts to weak, fragile grip on processes, at least
when portability is important).  Note that not using "ps" directly
doesn't imply anything, as libprocps shipped with procps-ng can be
used through language bindings under the hood.

So my call is that it'd be wise to revisit usage of ps-like
commands in the agents, in order to keep the surface possibly
affecting run of the resources as small as possible.  Opposite
approach, explicit claim that any reliability of the cluster
stack is void when arbitrarily locked-down external user happens
to have a login access to the machine, would also be valid
("no on-host privilege separation is the safest one"), but then
why to bother with ACLs in pacemaker, etc.

Note that it is currently unknown whether only agents'
implementations are hit, as pacemaker itself does some /proc
traversal, though not with the help of libprocps, but on its own.
It looks that at least a negligible race condition may be fixed
on kernel's side (CVE-2018-1121) and pacemaker would directly
benefit from that, but that's just a drop in the ocean compared
to pre-existing /proc handling issues.

It's likewise unkown to what extent other systems and other process
listing utilities are affected.


[1] parenthesised part of
https://oss.clusterlabs.org/pipermail/developers/2017-July/001098.html

-- 
Jan (Poki)


pgpsBifRsq1wO.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Impact of changing Pacemaker daemon names on other projects?

2018-04-16 Thread Jan Pokorný
On 16/04/18 14:32 +0200, Klaus Wenninger wrote:
> On 04/16/2018 01:52 PM, Jan Pokorný wrote:
>> On 29/03/18 11:13 -0500, Ken Gaillot wrote:
>>> 4. Public API symbols: for example, crm_meta_name() ->
>>> pcmk_meta_name(). This would be a huge project with huge impact, and
>>> will definitely not be done for 2.0.0. We would immediately start using
>>> the new convention for new API symbols, and more slowly update existing
>>> ones (with compatibility wrappers for the old names).
>> 
>> Value added here would be putting some commitment behind the "true
>> public API" when the symbols get sifted carefully, leaving some other
>> naming prefixes reserved for private only ones (without any commitment
>> whatsoever).
> 
> Like e.g. pcmk_* & pcmkpriv_*  (preferably something shorter
> for the latter) ?

Yes, something like that (pcmk_* vs. anything not starting with "pcmk_"
might suffice), which would allow for compiling library(ies) twice
-- once for public use (only "public API" symbols visible), once
for pacemaker's own usage (libpcmk_foo_private.so, everything non-static
visible).  That might be a first step towards something supportable,
start with literally nothing in the public version, gradually grow the
numbers, with almost no hassle other than adding symbols to an external
list and/or renaming formerly private-only symbols so as to match the
regexp/glob.  All native executables would naturaly link against
libpcmk_foo_private versions.  Later on, these can be merged or
otherwise restructured.

-- 
Poki


pgprD265u_74B.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Impact of changing Pacemaker daemon names on other projects?

2018-04-16 Thread Jan Pokorný
On 29/03/18 11:13 -0500, Ken Gaillot wrote:
> As I'm sure you've seen, there is a strong sentiment on the users list
> to change all the Pacemaker daemon names in Pacemaker 2.0.0, mainly to
> make it easier to read the logs.
> 
> This will obviously affect any other scripts and projects that look for
> the old names. I'd like to hear more developer input on how far we
> should go with this, and how much or little of a headache it will
> cause. I'm interested in both the public projects that use pacemaker
> (crmsh, pcs, sbd, dlm, openstack) and one-off scripts that people
> commonly put together.
> 
> In order of minimum impact to maximum impact, we could actually do this
> in stages:
> 
> 1. Log tags: This hopefully wouldn't affect anyone. For example, from
> 
> Mar 12 12:10:49 [11120] node1 pacemakerd: info:
> crm_log_init: Changed active directory to /var/lib/pacemaker/cores
> 
> to
> 
> Mar 12 12:10:49 [11120] node1 pcmk-launchd: info:
> crm_log_init: Changed active directory to /var/lib/pacemaker/cores
> 
> 2. Process names: what shows up in "ps". I'm hoping this would affect
> very little outside code, so we can at least get this far.
> 
> 3. Library names: for example, -lstonithd to -lpcmk-fencing. Other
> projects would need their configure script to auto-detect which is
> available. Not difficult, but it makes all older versions of other
> projects incompatible with Pacemaker 2.0. This is mostly what I want
> feedback on, whether this is a good idea. The only advantage is
> consistency and clarity.

Good news is that pkg-config/pkgconf (PKG_CHECK_MODULES et al.
Autoconf macros) honours names of *.pc files, hence compatibility
can be maintained with symlinks.

> 4. Public API symbols: for example, crm_meta_name() ->
> pcmk_meta_name(). This would be a huge project with huge impact, and
> will definitely not be done for 2.0.0. We would immediately start using
> the new convention for new API symbols, and more slowly update existing
> ones (with compatibility wrappers for the old names).

Value added here would be putting some commitment behind the "true
public API" when the symbols get sifted carefully, leaving some other
naming prefixes reserved for private only ones (without any commitment
whatsoever).

-- 
Poki


pgpbl4MhZnKdK.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] New challenges with corosync 3/kronosnet + pacemaker

2018-02-19 Thread Jan Pokorný
On 09/02/18 17:55 -0600, Ken Gaillot wrote:
> On Fri, 2018-02-09 at 18:54 -0500, Digimer wrote:
>> On 2018-02-09 06:51 PM, Ken Gaillot wrote:
>>> On Fri, 2018-02-09 at 12:52 -0500, Digimer wrote:
>>>> On 2018-02-09 03:27 AM, Jan Pokorný wrote:
>>>>> there is certainly whole can of these worms, put first that
>>>>> crosses my mind: performing double (de)compression on two levels
>>>>> of abstraction in the inter-node communication is not very
>>>>> clever, to put it mildly.
>>>>> 
>>>>> So far, just pacemaker was doing that for itself under certain
>>>>> conditions, now corosync 3 will have it's iron in this fire
>>>>> through kronosnet, too.  Perhaps something to keep in mind to
>>>>> avoid exercises in futility.
>>>> 
>>>> Can pacemaker be told to not do compression? If not, can that be
>>>> added in pacemaker v2?
>>> 
>>> Or better yet, is there some corosync API call we can use to
>>> determine whether corosync/knet is using compression?
>>> 
>>> There's currently no way to turn compression off in Pacemaker,
>>> however it is only used for IPC messages that pass a fairly high
>>> size threshold, so many clusters would be unaffected even without
>>> changes.
>> 
>> Can you "turn off compression" but just changing that threshold to
>> some silly high number?
> 
> It's hardcoded, so you'd have to edit the source and recompile.

FTR, since half year ago, I've had some resources noted for further
investigation on this topic of pacemaker-level compression -- since
it compresses XML, there are some specifics of the input that sugggest
more effective processing is possible.

Indeed; there's a huge, rigorously maintained non-binary files
compression benchmark that coincidentally also aims at XML files
(despite presumably more text-oriented than structure-oriented):

  http://mattmahoney.net/dc/text.html

Basically, I can see two (three) categories of possible optimizations:

0. pre-fill the scan dictionary for the compression algorithm
   with sequences that are statistically (constantly) most frequent
   (a priori known tag names?)

1. preprocessing of XML to allow for more efficient generic
   compression (like with bzip2 that is currently utilized), e.g.

   * XMill
 - https://homes.cs.washington.edu/~suciu/XMILL/

   * XWRT (XML-WRT)
 - https://github.com/inikep/XWRT

2. more effiecient algorithms as such for non-binary payloads
   (the benchmark above can help with selection of the candidates)

* * *

That being said, there are legitimate reasons to want merely the
high-level messaging be involved with compression, because that's
the only layer intimate with the respective application-specific
data and hence can provide optimal compression methods beyond
the reach of the generic ones.

-- 
Poki


pgpbbnYCqJu0U.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Error when linking to libqb in shared library

2018-02-12 Thread Jan Pokorný
[let's move this to developers list]

On 12/02/18 07:22 +0100, Kristoffer Grönlund wrote:
> (and especially the libqb developers)
> 
> I started hacking on a python library written in C which links to
> pacemaker, and so to libqb as well, but I'm encountering a strange
> problem which I don't know how to solve.
> 
> When I try to import the library in python, I see this error:
> 
> --- command ---
> PYTHONPATH='/home/krig/projects/work/libpacemakerclient/build/python' 
> /usr/bin/python3 
> /home/krig/projects/python-pacemaker/build/../python/clienttest.py
> --- stderr ---
> python3: utils.c:66: common: Assertion `"implicit callsite section is 
> observable, otherwise target's and/or libqb's build is at fault, preventing 
> reliable logging" && work_s1 != NULL && work_s2 != NULL' failed.
> ---
> 
> This appears to be coming from the following libqb macro:
> 
> https://github.com/ClusterLabs/libqb/blob/master/include/qb/qblog.h#L352
> 
> There is a long comment above the macro which if nothing else tells me
> that I'm not the first person to have issues with it, but it doesn't
> really tell me what I'm doing wrong...
> 
> Does anyone know what the issue is, and if so, what I could do to
> resolve it?

Something similar has been reported already:
https://github.com/ClusterLabs/libqb/pull/266#issuecomment-356855212

and the fix is proposed:
https://github.com/ClusterLabs/libqb/pull/288/commits/f9f180cdbcb189b6590e541502b1de658c81005e
https://github.com/ClusterLabs/libqb/pull/288

But the suitability depends on particular usecase.

I guess you are linking your python extension with one of the
pacemaker libraries (directly on indirectly to libcrmcommon), and in
that case, you need to rebuild pacemaker with the patched libqb[*] for
the whole arrangement to work.  Likewise in that case, as you may be
aware, the "API" is quite uncommitted at this point, stability hasn't
been of importance so far (because of the handles into pacemaker being
mostly abstracted through built-in CLI tools for the outside players
so far, which I agree is encumbered with tedious round-trips, etc.).
There's a huge debt in this area, so some discretion and perhaps
feedback which functions are indeed proper-API-worth is advised.

[*]
shortcut 1: just recompile pacemaker with those extra
/usr/include/qb/qblog.h modifications as of the
referenced commit)
shortcut 2: if the above can be tolerated widely, this is certainly
for local development only: recompile pacemaker with
CPPFLAGS=-DQB_KILL_ATTRIBUTE_SECTION

Hope this helps.

-- 
Jan (Poki)


pgpk1KYaoyU9n.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] New challenges with corosync 3/kronosnet + pacemaker

2018-02-09 Thread Jan Pokorný
Hello,

there is certainly whole can of these worms, put first that crosses
my mind: performing double (de)compression on two levels of abstraction
in the inter-node communication is not very clever, to put it mildly.

So far, just pacemaker was doing that for itself under certain
conditions, now corosync 3 will have it's iron in this fire through
kronosnet, too.  Perhaps something to keep in mind to avoid
exercises in futility.

-- 
Jan (Poki)


pgp047OxBfbZe.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [ClusterLabs] [IMPORTANT] Fatal, yet rare issue verging on libqb's design flaw and/or it's use in corosync around daemon-forking

2018-01-29 Thread Jan Pokorný
[developers list subscribers, kindly jump to "Current libqb PR" part]

On 22/01/18 11:29 +0100, Jan Friesse wrote:
>> It was discovered that corosync exposes itself for a self-crash
>> under rare circumstance whereby corosync executable is run when there
>> is already a daemon instance around (does not apply to corosync serving
>> without any backgrounding, i.e. launched with "-f" switch).
>> 
>> Such a circumstance can be provoked unattendedly by the third party,
>> incl. "corosync -v" probe triggered internally by pcs (since 9e19af58
>> ~ 0.9.145), which is what makes the root cause analysis of such
>> inflicted crash somewhat difficult to guess & analyze (the other
>> reason may be rather runaway core dump if produced at all due to
>> fencing coming, based on the few observed cases).
>> 
>> The problem comes from the fact that corosync is arranged such that
>> the logging is set up very early, even before the main control flow
>> of the program starts.  And part of this early enabling is also
>> starting "blackbox" recording, which uses mmap'd file stored in
>> /dev/shm that, moreover, only varies on PID that is part of the file
>> name -- and when corosync performs the fork so as to detach itself
>> from the environment it started it, such PID is free to be reused.
>> And against all odds, when that happens with this fresh new corosync
>> process, it happily mangles the file underneath the former daemon one,
>> leading to crashes indicated by SIGBUS, rarely also SIGFPE.
>> 
>> * * *
>> 
>> There are two quick mitigation techniques that can be readily applied:
>> 
>> 1. make on-PATH corosync executable rather a "careful" wrapper:
>> 
>>cp -a /sbin/corosync /sbin/corosync.orig
>>> /sbin/corosync cat <>#!/bin/sh
>>test "\$1" != -v || { echo "$(/sbin/corosync.orig -v)"; exit 0; }
>>exec /sbin/corosync.orig "\$@"
>>EOF
>> 
>>(when using SELinux, check the function and possibly fix the
>>contexts on these files)
>> 
>> 2. extend the PID space so as to move its wrap-around (precondition
>>for reproducing the issue) further to the future (hence make the
>>critical moments spread less frequently, lowering the overall
>>probability), for instance with Linux kernel:
>> 
>>echo 4194303 > /proc/sys/kernel/pid_max

or with recent enough Linux kernel, thanks to the fact that lowest
300 PIDs won't get recycled for being reserved (mutually exclusive
to measure 2. above):

3. start corosync service in a separate PID namespace, e.g. it gets
   started with initscript (or alike script, which may happen also
   under systemd supervision):

   > /sbin/corosync-daemon cat <&- 2>&- unshare -fp --mount-proc sh -c \\
   "corosync \$*& while kill -0 -1 2>/dev/null;do wait&&sleep 1||exit 
\\\$?;done"&
   EOF
   chmod +x /sbin/corosync-daemon
   echo "prog=corosync-daemon" >> /etc/sysconfig/corosync

[Apparently, it would be easier to just wrap the non-forking corosync
(-f switch) in substantially easier way, but then, it's questionable
why this is not the default where possible (e.g. under systemd).]

>> * * *
>> 
>> To claim this problem is fixed, at least all three mentioned components
>> will have to do its part to limit the problem in the future:
>> 
>> - corosync (do something new after fork?)
> 
> Patch proposal:
> 
> https://github.com/corosync/corosync/pull/308

What I propose is a solution addressing mix-and-match scenarios
amongst components, with fix split between corosync:

  https://github.com/corosync/corosync/pull/309

and libqb:

  https://github.com/ClusterLabs/libqb/pull/293

For the former, it may be indeed appealing to add posibility to
toggle blackbox logging directly in corosync configuration, but
that's not the long-term continuity fix.

Note that corosync patched like this alone is sufficient to prevent
the crashes at hand fully, patched libqb then only adds smoothness
to the PID-rollover-while-blackbox-in-use scenario.

> Also problem is really very rare and reproducing it is quite hard.

Agreed, though it's not that hard to trigger the condition
purposefully, see the libqb PR containing also the contrived
reproducer.

>> - libqb (be more careful about the crashing condition?)

Current libqb PR shifts the responsibility to avoid the precondition
(clash on the PID recycled in the interim) to the libqb client that
initiates blackbox logging prior to the fork wherein the parent
immediately terminates.

There's hardly a way to arrange it other way around, since while there
are limited ways for libqb to hook on the forking in the main process
(cf. pthread_atfork(3)), they are not generic enough (cf. clone(2)),
and verifying whether PID has changed upon each log entry would add
unjustified overhead, amplified further by the fact glibc is not
caching getpid(2) value anymore:

  https://sourceware.org/bugzilla/show_bug.cgi?id=15392

Ironically in this context, libqb caches "logging source" PID on it's
own, hence is practically untouched.  On the other hand, that's w

Re: [ClusterLabs Developers] Reference to private bugzillas in commit messages

2018-01-09 Thread Jan Pokorný
On 09/01/18 18:24 +, Adam Spiers wrote:
> Here are the abbreviations currently used within openSUSE and SUSE:
> 
>
> https://en.opensuse.org/openSUSE:Packaging_Patches_guidelines#Current_set_of_abbreviations
> 
> Also see this issue (even if you aren't involved with openSUSE) where I go
> into depth on considerations relating to the use of these shorthand
> references:
> 
>https://github.com/openSUSE/obs-service-tar_scm/issues/207

The links with apparently serial identifiers are least of a problem
even if no URL rewriting applies (frankly, I would question operator's
sanity if the URL scheme like that changes and direct access links,
spread widely, cease to get users to the new locations), as it's
expected that the intended destination will be easily recovered anyway.

> and also this ancient 2013 thread which highlights that a "foo#1234" format
> is too simplistic for references to sites like GitHub:
> 
>https://lists.opensuse.org/opensuse-packaging/2013-06/msg00032.html
> 
> My personal take is that URLs were designed by very clever people for
> exactly this hyperlinking purpose, have been proven over multiple decades,
> and are universally understood by both humans and all kinds of software.  So
> why on earth reinvent the wheel just for this microscopic use case?  To save
> a few bytes?

/me personally concurs

Speaking of my personal tastes, I also like the semiformalized tail
tags akin to widely spread "Signed-of-by: name ".  For
instance, I use it extensively in clufter, e.g.:

https://pagure.io/clufter/c/1db634c742bad8546fb658f2e4cb857c6f68b37c

(for anyone curious, I do my best to check referenced bugs are public)

I'd say, also more polite than to push the tracker identifier to the
promiment position in the message summary.

-- 
Jan (Poki)


pgp2nhpqaP6Oc.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Reference to private bugzillas in commit messages

2018-01-09 Thread Jan Pokorný
On 09/01/18 09:56 -0600, Ken Gaillot wrote:
> The acronyms, like any other, you just have to pick up over time with
> experience. I'll add the ones I know to the Pacemaker Development
> document, which are:
> 
>   LFBZ - old Linux Foundation bugzilla for the Linux-HA project - https
> ://developerbugs.linuxfoundation.org/buglist.cgi?product=Pacemaker

I fail to find a single reference to this, as opposed to mere "LF" in
the repo log.

>   CLBZ - ClusterLabs bugzilla - https://bugs.clusterlabs.org/

Similarly, these were historically referred often just as
"cl#"

>   RHBZ - Red Hat bugzilla - https://bugzilla.redhat.com/
> 
>   BSC - SuSE bugzilla - https://bugzilla.suse.com/index.cgi

There are also "bnc" entries in the repo log that, I grok, stands for
bugzilla.novell.com.

> Sometimes you'll see bz without anything more specific, which is
> usually LFBZ or CLBZ depending on when it was written.

-- 
Jan (Poki)


pgpRo9i37iaDd.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Reference to private bugzillas in commit messages

2018-01-09 Thread Jan Pokorný
On 09/01/18 10:35 +, Adam Spiers wrote:
> Andrei Borzenkov  wrote:
>> On Tue, Jan 9, 2018 at 11:23 AM, Kristoffer Grönlund
>>  wrote:
>>> Andrei Borzenkov  writes:
>>> 
 I wonder what is the policy here.
 
 commit 7b7521c95d635d8b4cf04f645a6badc1069c6b46
 Author: liangxin1300 
 Date:   Fri Dec 29 15:27:40 2017 +0800
 
fix: ui_resource: Using crm_failcount instead of
 crm_attribute(bsc#1074127)
 
 
 Apart from the obvious - how would contributor know what "bsc" is in the
 first place and how to check it - attempt to access
 https://bugzilla.suse.com/show_bug.cgi?id=1074127 gives
 
 You are not authorized to access bug #1074127
 
 Randomly checking other bsc# references gives the same "permissions
 denied" result.
>>> 
>>> We include those bugzilla references to make it easier for ourselves to
>>> connect fixes to bugs in the rpm changelogs (for example). I can
>>> honestly say that I don't know if there is a policy or what it is in
>>> that case, it was "established practice" when I joined the project.

Well, in fact there is no such official policy around this, but
I tried to change that in past:

  https://github.com/ClusterLabs/pacemaker/pull/1119

as this no-open-access hubris (seconded by related
no-change-selfcontainment) disturbs me _a lot_ in the context
of _free_ (as in freedom) software.  Just think about it.

>>> I think Red Hat does the same?

The above reference gives you an answer that this camp is also not
guilt-free here (https://github.com/ClusterLabs/pacemaker/pull/887).

>>> You should be able to just create an account in the SUSE bugzilla and
>>> then have access to the bug.

What's the point of requiring the audience to be registered, then?

>> I have private account on (open)SUSE bugzilla and I'm denied access to
>> these bugs.
> 
> Some commercial products in the (open)SUSE bugzilla, presumably
> including SUSE Linux Enterprise High Availability, are configured such
> that newly submitted bugs default to being private to SUSE employees
> only, in order to protect potentially confidential information
> submitted by our customers.  My best guess is that the bug referenced
> above is one of these bugs which defaulted to private.
> 
> However, there is a solution!  Assuming there is no confidential
> information in a bug such as log files or other info provided by one
> of our customers

AFAIK, the privacy can be set on particular comment/attachment basis
in Bugzilla instances (ok, with the associated risk added that something
will leak unintentionally)...

> any SUSE employee can set any of these bugs as being visible
> externally.  And indeed this should be done as much as possible.

... however, this is a moot discussion we would be better off avoiding
in the first place as:

1. the changes tracked in the repo would preferably be self-contained
   as mentioned

   - on random commit access, the change should be comprehensible just
 by the means of code + in-code comments + commit message, without
 any reliance on external tracker or on out-of-repo PR comments
 (e.g., I don't understand why the explanation did not go into
 the commit itself in case of
 https://github.com/ClusterLabs/pacemaker/pull/1402) -- come on
 people, when the code base is to stand the test of time, is it
 more likely that the context survives in the proprietary
 free-of-charge service without massive replication, or in the
 bits being indivisible part of the distributed repo?

2. if the bug identifier is absolutely necessary for some reason,
   ClusterLabs host the Bugzilla instance at
   https://bugs.clusterlabs.org/

   - items in other trackers could be cross-linked from there

> If there *is* confidential information, but it is desired for the fix
> to be public (e.g. referenced within a commit message in, say, the
> Pacemaker repository), then I would recommend my colleagues to ensure
> that there are two bugs: a private one containing the confidential
> information, which links to a public one which contains all the
> information which can be shared with the upstream FL/OSS project.

Proper problem statement in the commit message accompanying the fix
would alleviate these sorts of redundancies, and would lead to
improvements on the non-code/soft-skills aspects of the contributions,
IMHO.

> Kristoffer, does that approach work for you and your team?

-- 
Jan (Poki)


pgpjzCR_jeun_.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] [ANTICIPATED FAQ] libqb v1.0.3 vs. binutils' linker (Was: [Announce] libqb 1.0.3 release)

2017-12-21 Thread Jan Pokorný
I've meant to spread following piece advice but forgot...

On 21/12/17 17:45 +0100, Jan Pokorný wrote:
> On 21/12/17 14:40 +, Christine Caulfield wrote:
>> We are pleased to announce the release of libqb 1.0.3
>> 
>> 
>> Source code is available at:
>> https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.xz
>> 
>> 
>> This is mainly a bug-fix release to 1.0.2
>> 
>> [...]
> 
> Thanks Chrissie for the release; I'd like to take this opportunity to
> pick on one particularly important thing for "latest greatest pursuing"
> system deployments and distributions:
> 
>> High: bare fix for libqb logging not working with ld.bfd/binutils 2.29+
> 
> Together with auxiliary changes likewise present in v1.0.3, this
> effectively allows libqb to fulfil its logging duty properly also
> when any participating binary part (incl. libqb as a library itself)
> was build-time linked with a standard linker (known as ld or ld.bfd)
> from binutils 2.29 or newer.  Previous libqb releases would fail
> one way or another to proceed the messages stemming from ordinary way
> to issue them under these circumstances (and unless the linker feature
> based offloading was bypassed, which happens, e.g., for selected
> architectures [PowerPC] or platforms [Cygwin] automatically).

So now, you may face these questions:

Q1: Given the fact there was no SONAME bump (marking binary
compatibility being preserved) with libqb v1.0.3, do I have
to rebuild everything depending on libqb once I deploy this
new, "log-fixing" version?

A1: First, yes, public-facing ABI remains unchanged.  Second, it
depends whether these dependent components have anything in
common with ld linker from binutils 2.29+:

- every component that has already been build-time linked using
  such a linker prior to deploying the log-fixing libqb version
  (just this v1.0.3 and newer if we talk about official releases)
  SHOULD be recompiled with the log-fixing libqb in the build-time
  link (note that libqb pre-1.0.3 will likewise break the logging
  of the run-time linked-by programs when build-time linked using
  such a linker, but that's off-topic as we discuss
  post-deployment of the log-fixing version)

- for extra sanity, you may consider rebuilding such components,
  which will gain an advantage in case there's a risk of libqb
  being downgraded to "pre-1.0.3 version that was built-time
  linked with binutils 2.29+" -- but the mitigation measure will
  ONLY have an effect in case the component in question uses
  QB_LOG_INIT_DATA macro defined qblog.h header file of libqb
  (e.g, pacemaker does)

- otherwise, no component needs rebuilding if it was previously
  built using pre-2.29 binutils' linker, it shall combine with
  new log-fixing libqb (build-time linked with whichever binutils'
  linker) just fine

- mind that some minor exceptions do apply (see the end of the
  quoted response wrt. architectures and platforms) but are left
  out from the previous consideration


Please response on either or both lists should you have more
questions.

I am far from testing any possible combination of mixing various
build-time linkers/libqb versions per partes for the software pieces
that will eventually get linked together, but tried to cover that
space exhaustively in the limited dimensions, so let's say I have
some insights and intuition, and we can always test the particular
set of input configuration variables by hand to get any wiser.

-- 
Jan (Poki)


pgpaYKgurhqRB.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-12-15 Thread Jan Pokorný
On 19/10/17 22:49 +0200, Jan Pokorný wrote:
> The reconciling patchset is not merged yet, but I'd say it's in the
> good shape: https://github.com/ClusterLabs/libqb/pull/266
> 
> Testing is requested, of course ;)

We finally got to merge it with some ulterior changes, and there's
just a few more cleanups pending till the upcoming new release.

-- 
Jan (Poki)


pgpMAetK71JpN.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-12-05 Thread Jan Pokorný
On 02/12/17 21:00 +0100, Jan Pokorný wrote:
> https://jdebp.eu/FGA/unix-daemon-readiness-protocol-problems.html
> 
> Quoting it:
>   Of course, only the service program itself can determine exactly
>   when this point [of being ready, that, "is about to enter its main
>   request processing loop"] is.
> 
> There's no way around this.
> 
> The whole objective of OCF standard looks retrospectively pretty
> sidetracked through this lense: instead of pulling weight of the
> semiformal standardization body (comprising significant industry
> players[*]) to raise awareness of this solvable reliability
> discrepancy, possibly contributing to generally acknowledged,
> resource manager agnostic solution (that could be inherited the
> next generation of the init systems), it just put a little bit of
> systemic approach to configuration management and monitoring on
> top of the legacy of organically grown "good enough" initscripts,
> clearly (because of inherent raciness and whatnot) not very suitable
> for the act of supervision nor for any sort of reactive balancing
> to satisfy the requirements (crucial in HA, polling interval-based
> approach leads to losing trailing nines needlessly for cases you
> can be notified about directly).

... although there was clearly a notion of employing asynchronous
mechanisms (one can infer, for technically more sound binding between
the resource manager and the supervised processes) even some 14+ years
ago:
https://github.com/ClusterLabs/OCF-spec/commit/2331bb8d3624a2697afaf3429cec1f47d19251f5#diff-316ade5241704833815c8fa2c2b71d4dR422

> Basically, that page also provides an overview of the existing
> "formalized intefaces" I had in mind above, in its "Several
> incompatible protocols with low adoption" section, including
> the mentioned sd_notify way of doing that in systemd realms
> (and its criticism just as well).
> 
> Apparently, this is a recurring topic because to this day, the problem
> hasn't been overcome in generic enough way, see NetBSD, as another
> example:
> https://mail-index.netbsd.org/tech-userlevel/2014/01/28/msg008401.html
> 
> This situation, caused by a lack of interest to get things right
> in the past plus OS ecosystem segmentation playing against any
> conceivable attempt to unify on a portable solution, is pretty
> unsettling :-/
> 
> [*] see https://en.wikipedia.org/wiki/Open_Cluster_Framework

-- 
Jan (Poki)


pgppERVlwhH_z.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-12-02 Thread Jan Pokorný
On 07/11/17 02:01 +0100, Jan Pokorný wrote:
> On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
>> 06.11.2017 22:38, Valentin Vidic пишет:
>>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>>>> I think it depends on exactly what you mean by "synchronous" here. You can
>>>> start up a daemon, or a process which is responsible for forking into a
>>>> daemon, but how can you know for sure that a service is really up and
>>>> running?  Even if the daemon ran for a few seconds, it might die soon 
>>>> after.
>>>> At what point do you draw the line and say "OK start-up is now over, any
>>>> failures after this are failures of a running service"?  In that light,
>>>> "systemctl start" could return at a number of points in the startup 
>>>> process,
>>>> but there's probably always an element of asynchronicity in there.
>>>> Interested to hear other opinions on this.
>>> 
>>> systemd.service(5) describes a started (running) service depending
>>> on the service type:
>>> 
>>> simple  - systemd will immediately proceed starting follow-up units (after 
>>> exec)
>>> forking - systemd will proceed with starting follow-up units as soon as
>>>   the parent process exits
>>> oneshot - process has to exit before systemd starts follow-up units
>>> dbus- systemd will proceed with starting follow-up units after the
>>>   D-Bus bus name has been acquired
>>> notify  - systemd will proceed with starting follow-up units after this
>>>   notification message has been sent
>>> 
>>> Obviously notify is best here
>> 
>> forking, dbus and notify all allow daemon to signal to systemd that
>> deamon is ready to service request. Unfortunately ...
>> 
>>> but not all daemons implement sending
>>> sd_notify(READY=1) when they are ready to serve clients.
>>> 
>> 
>> ... as well as not all daemons properly daemonize itself or register on
>> D-Bus only after they are ready.
> 
> Sharing the sentiment about the situation, arising probably primarily
> from daemon authors never been pushed to indicate full ability to
> provide service precisely because 1/ it's not the primary objective of
> init systems -- the only thing they would need to comply with
> regarding getting these daemons started (as opposed to real
> service-oriented supervisors, which is also the realm of HA, right?),
> and 2/ even if it had been desirable to indicate that, no formalized
> interface (and in turn, system convolutions) that would become
> widespread was devised for that purpose.  On the other hand, sd_notify
> seems to reconcile that in my eyes (+1 to Valetin's qualifying it
> the best of the above options) as it doesn't impose any other effect
> (casting extra interpretation on, say, a fork event makes it
> possibly not intended or at least not-well-timed side-effect of the
> main, intended effect).

I had some information deficits that only now are becoming catered.
Specifically, I discovered this nice, elaborate study on the
"Readiness protocol problems with Unix dæmons":
https://jdebp.eu/FGA/unix-daemon-readiness-protocol-problems.html

Quoting it:
  Of course, only the service program itself can determine exactly
  when this point [of being ready, that, "is about to enter its main
  request processing loop"] is.

There's no way around this.

The whole objective of OCF standard looks retrospectively pretty
sidetracked through this lense: instead of pulling weight of the
semiformal standardization body (comprising significant industry
players) to raise awareness of this solvable reliability
discrepancy, possibly contributing to generally acknowledged,
resource manager agnostic solution (that could be inherited the
next generation of the init systems), it just put a little bit of
systemic approach to configuration management and monitoring on
top of the legacy of organically grown "good enough" initscripts,
clearly (because of inherent raciness and whatnot) not very suitable
for the act of supervision nor for any sort of reactive balancing
to satisfy the requirements (crucial in HA, polling interval-based
approach leads to losing trailing nines needlessly for cases you
can be notified about directly).

Basically, that page also provides an overview of the existing
"formalized intefaces" I had in mind above, in its "Several
incompatible protocols with low adoption" section, including
the mentioned sd_notify way of doing that in systemd realms
(and its criticism just as well).

Apparently, this is a recurring topic becaus

Re: [ClusterLabs Developers] New LVM resource agent name (currently LVM-activate)

2017-11-24 Thread Jan Pokorný
On 24/11/17 07:10 +0100, Kristoffer Grönlund wrote:
> Jan Pokorný  writes:
>> Wanted to add a comment on IPaddr vs. IPaddr2 (which, as mentioned,
>> boils down to ifconfig vs. iproute2) situation being used for
>> comparison -- this is substantially a different story, as iproute2
>> (and in turn, IPaddr2) is Linux-only, while the whole stack is more
>> or less deployable on various *nixes so having two agents in parallel,
>> one portable but with some deficiences + one targeted and more capable
>> makes a damn good sense.  Cannot claim the same here.
> 
> While there may be good reason from an implementation standpoint for
> having two agents, that doesn't mean that it makes sense from a user
> perspective. It's just not particularly beautiful or clear to configure
> IP addresses using "IPaddr2". The preferred solution from a user
> perspective would certainly have been to have a single interface to IP
> addresses which uses whatever means are available on the current
> platform.

I agree that it would be convenient configuration-wise, not being
hold back early on the tool selection based on the particular details,
but they are somewhat important:

- which network interface configuration front-end is available
- does this front-end support IPv6
- does this front-end support specifications of bonding/teaming/magic
  interfaces and with the requested granularity
- is load-balancing a la CLUSTERIP iptables extension allowing the
  agent to be a meaningful clone

I.e., you will get a different "personality" of the hypothetical
unified agent depending on the circumstances, meaning the agent
will not be as clear winner of unity vs. confusion arising from
non-portability of some configuration traits.

The implied huge internal complexity can be overcome with sticking
with some best practises -- proper decomposition, modularity,
build-time macro-based conditionalizing based on what build
configuration script detected, etc.  But if these "personalities" are
prominent (like Linux distributions being compatible with iproute2
incl. support for IPv6 and details of particular magic interfaces +
CLUSTERIP extension nowadays) vs. portable minimum, I still think it
made sense to make the cut, just the names may be a bit unfortunate.

> In the same way, a user wanting to configure LVM would encounter a
> variety of agents named "lvm", "lvm2", "lvm-ng", "lvm-maybe" and would
> most likely end up digging through mailing list posts, reference manuals
> and XML metadata to maybe figure out which one is a) up to date and b)
> appropriate for the current platform - since even though there is
> probably a clear explanation for which one to use somewhere, there is no
> way for a new user to know where to find that explanation.

Agree in the sense that for LVM, I don't see discerned "personalities"
like I could find for IP agents.

> At least, that's my opinion, clearly opinions differ on this matter ;)

Nobody has a patent on what's best (even hardlier a bystander like
me), that's why discourse is a vital (discuss early, discuss often).

-- 
Jan (Poki)


pgptaDe3kVcX_.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] New LVM resource agent name (currently LVM-activate)

2017-11-23 Thread Jan Pokorný
[this follow-up is mostly to re-CC some people that were gradually
 omitted as the thread progressed, I am not sure who's subscribed
 and who not with them]

On 23/11/17 20:27 +0100, Jan Pokorný wrote:
> On 23/11/17 16:54 +0800, Eric Ren wrote:
>>> What about VolumeGroup (in the tradition of Filesystem, for instance)?
> 
>> In the LVM-activate, we will support both all VG activation and only
>> one specified LV activation depending on the parameters.
> 
> This non-educated suggestion was driven solely by the fact that VG
> needs to always be specified.
> 
>>> Or why not shoot for an LVM merge (plus proper versioning to tell
>>> the difference)?
>> 
>> You mean merging LVM-activate with the existing LVM?
> 
> Yep, see below.
> 
>> Here was a long discussion about that:
>> 
>> https://github.com/ClusterLabs/resource-agents/pull/1040
> 
> Honestly, was only vaguely aware of some previous complaints from
> Dejan in a different thread, but otherwise unlightened on what's
> happening.
> 
> And I must admit, I am quite sympathetic to the non-articulated wish
> of knowing there's a plan to give a new spin to enclustered LVM
> beforehand -- afterall, adoption depends also on whether the situation
> is/will be clear to the userbase.  Some feedback could have been
> gathered earlier -- perhaps something to learn some lessons from
> for the future.
> 
> Putting the "community logistics" issue aside...
> 
> Bear with me, I am only very slightly familiar with the storage field.
> I suspect there are some framed pictures of "LVM-activate" use that
> are yet to be recognized.  At least it looks to me like one of them
> is to couple+serialize lvmlockd agent instance followed with
> "LVM-activate".  In that case, the latter seems to be spot-on naming
> and I'd rather speculate about the former to make this dependency
> clearer, renaming it to something like LVM-prep-lvmlockd :-)
> 
> [At this point, I can't refrain myself from reminding how handy the
>  "composability with parameter reuse" provision in RGManager was,
>  and how naturally it would wrap such a pairing situation (unless
>  the original assumption doesn't hold).  I don't really see how
>  that could be emulated in pacemaker, it would definitely be
>  everything but a self-contained cut.]

Wanted to add a comment on IPaddr vs. IPaddr2 (which, as mentioned,
boils down to ifconfig vs. iproute2) situation being used for
comparison -- this is substantially a different story, as iproute2
(and in turn, IPaddr2) is Linux-only, while the whole stack is more
or less deployable on various *nixes so having two agents in parallel,
one portable but with some deficiences + one targeted and more capable
makes a damn good sense.  Cannot claim the same here.

-- 
Jan (Poki)


pgpe3e_gt1niP.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] New LVM resource agent name (currently LVM-activate)

2017-11-23 Thread Jan Pokorný
On 23/11/17 16:54 +0800, Eric Ren wrote:
>> What about VolumeGroup (in the tradition of Filesystem, for instance)?

> In the LVM-activate, we will support both all VG activation and only
> one specified LV activation depending on the parameters.

This non-educated suggestion was driven solely by the fact that VG
needs to always be specified.

>> Or why not shoot for an LVM merge (plus proper versioning to tell
>> the difference)?
> 
> You mean merging LVM-activate with the existing LVM?

Yep, see below.

> Here was a long discussion about that:
> 
> https://github.com/ClusterLabs/resource-agents/pull/1040

Honestly, was only vaguely aware of some previous complaints from
Dejan in a different thread, but otherwise unlightened on what's
happening.

And I must admit, I am quite sympathetic to the non-articulated wish
of knowing there's a plan to give a new spin to enclustered LVM
beforehand -- afterall, adoption depends also on whether the situation
is/will be clear to the userbase.  Some feedback could have been
gathered earlier -- perhaps something to learn some lessons from
for the future.

Putting the "community logistics" issue aside...

Bear with me, I am only very slightly familiar with the storage field.
I suspect there are some framed pictures of "LVM-activate" use that
are yet to be recognized.  At least it looks to me like one of them
is to couple+serialize lvmlockd agent instance followed with
"LVM-activate".  In that case, the latter seems to be spot-on naming
and I'd rather speculate about the former to make this dependency
clearer, renaming it to something like LVM-prep-lvmlockd :-)

[At this point, I can't refrain myself from reminding how handy the
 "composability with parameter reuse" provision in RGManager was,
 and how naturally it would wrap such a pairing situation (unless
 the original assumption doesn't hold).  I don't really see how
 that could be emulated in pacemaker, it would definitely be
 everything but a self-contained cut.]

-- 
Jan (Poki)


pgps3hVzdU5Np.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] New LVM resource agent name (currently LVM-activate)

2017-11-22 Thread Jan Pokorný
On 22/11/17 15:28 -0600, David Teigland wrote:
> On Wed, Nov 22, 2017 at 03:08:19PM -0600, David Teigland wrote:
>> On Wed, Nov 22, 2017 at 02:34:31PM -0600, Chris Feist wrote:
>>> lvm2:
>>> Good - It's obvious it's a newer/better version of the lvm agent.
>>> Bad - It may be associated with the lvm2 commands which we are working on
>>> phasing out.
>> 
>> Unfortunately, the lvm project used the name "lvm2" to mean something
>> different, so giving a new meaning to the "lvm" vs "lvm2" distinction
>> would be confusing.
> 
> If nobody else finds this confusing, then don't let this get in the way of
> doing lvm2.
> 
>> lvmvg or lvmlv or lvm_vg/lvm_lv
> 
> I don't mind these.  Another idea is "lvm_agent", "lvm_ra".  I don't have
> a strong opinion about any of them.

What about VolumeGroup (in the tradition of Filesystem, for instance)?

Or why not shoot for an LVM merge (plus proper versioning to tell
the difference)?

-- 
Jan (Poki)


pgp2mkwURRAsc.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-11-06 Thread Jan Pokorný
[sorry, managed to drop most recent modifications just before sending,
fortunately got them from the editor's backups, so skip the previous
entry in the thread in favor of this one, also to avoid some typos
bleed, please]

On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
> 06.11.2017 22:38, Valentin Vidic пишет:
>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>>> I think it depends on exactly what you mean by "synchronous" here. You can
>>> start up a daemon, or a process which is responsible for forking into a
>>> daemon, but how can you know for sure that a service is really up and
>>> running?  Even if the daemon ran for a few seconds, it might die soon after.
>>> At what point do you draw the line and say "OK start-up is now over, any
>>> failures after this are failures of a running service"?  In that light,
>>> "systemctl start" could return at a number of points in the startup process,
>>> but there's probably always an element of asynchronicity in there.
>>> Interested to hear other opinions on this.
>> 
>> systemd.service(5) describes a started (running) service depending
>> on the service type:
>> 
>> simple  - systemd will immediately proceed starting follow-up units (after 
>> exec)
>> forking - systemd will proceed with starting follow-up units as soon as
>>   the parent process exits
>> oneshot - process has to exit before systemd starts follow-up units
>> dbus- systemd will proceed with starting follow-up units after the
>>   D-Bus bus name has been acquired
>> notify  - systemd will proceed with starting follow-up units after this
>>   notification message has been sent
>> 
>> Obviously notify is best here
> 
> forking, dbus and notify all allow daemon to signal to systemd that
> deamon is ready to service request. Unfortunately ...
> 
>> but not all daemons implement sending
>> sd_notify(READY=1) when they are ready to serve clients.
>> 
> 
> ... as well as not all daemons properly daemonize itself or register on
> D-Bus only after they are ready.

Sharing the sentiment about the situation, arising probably primarily
from daemon authors never been pushed to indicate full ability to
provide service precisely because 1/ it's not the primary objective of
init systems -- the only thing they would need to comply with
regarding getting these daemons started (as opposed to real
service-oriented supervisors, which is also the realm of HA, right?),
and 2/ even if it had been desirable to indicate that, no formalized
interface (and in turn, system convolutions) that would become
widespread was devised for that purpose.  On the other hand, sd_notify
seems to reconcile that in my eyes (+1 to Valetin's qualifying it the
best of the above options) as it doesn't impose any other effect
(casting extra interpretation on, say, a fork event makes it possibly
not intended or at least not-well-timed side-effect of the main,
intended effect).

To elaborate more, historically, it's customary to perform double fork
in the daemons to make them as isolated from controlling terminals and
what not as possible.  But it may not be desirable to perform anything
security sensitive prior to at least the first fork, hence with
"forking", you've already lost the preciseness of "ready" indication,
unless there is some further synchronization between the parent and
its child processes (I am yet to see that in practice).  So I'd say,
unless the daemon is specifically fine-tuned, both forking and dbus
types of services are bound to carry some amount of asynchronicity as
mentioned.  To the distaste of said service supervisors that strive to
maximize service usefulness over the considerable timeframe, which is
way more than ticking the "should be running OK because it got started
by me without any early failure" checkbox.

The main issue (though sometimes workable) of sd_notify approach is
that in your composite application you may not have a direct "consider
me ready" hook throughout the underlying stack, and tying it with
processing of the first request is out of question because it's timing
is not guaranteed (if it's ever to arrive).

Sorry, didn't add much to the discussion, meant to defend sd_notify's
perceived supremacy.  Getting rid of asynchronities (and related
magic, fragile sleeps on various places) is tough in the world that
wasn't widely interested in unified "as a service guarantee, reporting
true ready" signalling, IMHO.  (Verging on whatifs... what if some
supervisor-agnostic interface was designed in the prehistory, now
it could have been derived also by systemd, just as how such unified
logging interface, syslog, is widespread and functional up to these
days).

-- 
Poki


pgpwaUvD8lYfr.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-11-06 Thread Jan Pokorný
On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
> 06.11.2017 22:38, Valentin Vidic пишет:
>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>>> I think it depends on exactly what you mean by "synchronous" here. You can
>>> start up a daemon, or a process which is responsible for forking into a
>>> daemon, but how can you know for sure that a service is really up and
>>> running?  Even if the daemon ran for a few seconds, it might die soon after.
>>> At what point do you draw the line and say "OK start-up is now over, any
>>> failures after this are failures of a running service"?  In that light,
>>> "systemctl start" could return at a number of points in the startup process,
>>> but there's probably always an element of asynchronicity in there.
>>> Interested to hear other opinions on this.
>> 
>> systemd.service(5) describes a started (running) service depending
>> on the service type:
>> 
>> simple  - systemd will immediately proceed starting follow-up units (after 
>> exec)
>> forking - systemd will proceed with starting follow-up units as soon as
>>   the parent process exits
>> oneshot - process has to exit before systemd starts follow-up units
>> dbus- systemd will proceed with starting follow-up units after the
>>   D-Bus bus name has been acquired
>> notify  - systemd will proceed with starting follow-up units after this
>>   notification message has been sent
>> 
>> Obviously notify is best here
> 
> forking, dbus and notify all allow daemon to signal to systemd that
> deamon is ready to service request. Unfortunately ...
> 
>> but not all daemons implement sending
>> sd_notify(READY=1) when they are ready to serve clients.
>> 
> 
> ... as well as not all daemons properly daemonize itself or register on
> D-Bus only after they are ready.

Sharing the sentiment about the situation, arising probably primarily
from daemon authors never been pushed to indicate full ability to
provide service precisely because 1/ it's not the primary objective of
init systems -- the only thing they would need to comply with
regarding getting these daemons started (as opposed to real
service-oriented supervisors, which is also the realm of HA, right?),
and 2/ even if it had been desirable to indicate that, no formalized
interface (and in turn, system convolutions) that would become
widespread was devised for that purpose.  On the other hand, sd_notify
seems to reconcile that in my eyes (+1 to Valetin's qualifying it the best the
above options) as it doesn't impose any other effect (casting extra
interpretation on, say, a fork event makes it possibly not intended or
at least not-well-timed side-effect of the main, intended effect).

To elaborate more, historically, it's customary to perform double fork
in the daemons to make them as isolated from controlling terminals and
what not as possible.  But it may not be desirable to perform anything
security sensitive prior to at least the first fork, hence with
"forking", you've already lost the preciseness of "ready" indication,
unless there is some further synchronization between the parent and
its child processes (I am yet to see that in practice).  So I'd say,
unless the daemon is specifically fine-tuned, both forking and dbus
types of services are bound to carry some amount of asynchronicity as
mentioned.  To the distaste of said service supervisors that strive to
maximize service usefulness over the considerable timeframe, which is
way more than ticking the "should be running OK because it got started
by me without any early failure" checkbox.

The main issue (though sometimes workable) of sd_notify approach is
that in your composite application you may not have a direct "consider
me ready" hook throughout the underlying stack, and tying it with
processing of the first request is out of question because it's timing
is not guaranteed (if it's ever to arrive).

Sorry, didn't add much to the discussion, getting rid of
asynchronities is tough in the world that wasn't widely intrested
in poll/check-less "true ready" state.

-- 
Poki


pgpz3eLXdZbLh.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-10-20 Thread Jan Pokorný
On 19/10/17 22:49 +0200, Jan Pokorný wrote:
> On 03/08/17 20:50 +0200, Valentin Vidic wrote:
>>> Proper solution:
>>> - give me few days to investigate better ways to deal with this
> 
> well, that estimate was off... by far :)
> 
> But given the goals of
> - as high level of isolation of the client space from the linker
>   (respectively toolchain) subtleties as possible (no new compilation
>   flags an such on that side)
> - universality, as you don't really want to instruct libqb users
>   to use this set of flags with linker A and this with linker B,
>   and there's no way to hook any computational ad-hoc decision
>   when the compilation is about to happen (and particular linker
>   to be used + it's version are quite obscured in the build pipeline
>   so only configure-like checks would be viable, anyway)
> - mapping the exact scope of the issue for basic combinations of
>   link participants each doing the logging on its own and possibly
>   differing in the linker used to build them into standalone shared
>   libraries or executable using the former + cranking up the runner
>   of the underlying test matrix
> - some reasonable assurance that logging is not silently severed (see
>   the headaches note below)

I would have forgotten the uttermost important one(!):
- BINARY COMPATIBILITY (ABI) is PRESERVED, except for a single "ABI
  nongracefulness" I am aware of but that's more a consequence of
  slightly incorrect assumptions in the logic of QB_LOG_INIT_DATA
  macro function predating this whole affair by a long shot and which
  the patchset finally rectifies:
  if in the run-time dynamic link, following is combined:
  (. libqb, arbitrary variant: pre-/post-fix, binutils < / >= 2.29)
  . an "intermediate" library (something that the end executable links
with) triggering QB_LOG_INIT_DATA macro and being built with
pre-fix libqb (and perhaps only with binutils < 2.29)
  . end executable using no libqb's logging at all, but being built
with post-fix libqb (and arbitrary binutils < / >= 2.29)
  then, unlike when executable is built with pre-fix libqb, the
  special callsite data containing section in the ELF structure
  of the executable is created + its boundary denoting symbols
  defined within, despite the section being empty (did not happen
  with pre-fix libqb), and because the symbols defined within the
  target program have priority over that of shared libraries in the
  symbol resolution fallback scheme, the assertion of QB_LOG_INIT_DATA
  of the mentioned intermediate library will actually be evaluating
  the inequality of boundaries for the section of the executable(!)
  rather than it's own (or whatever higher prio symbols are hit,
  presumably only present if the section at that level is non-empty,
  basically a generalization of the story so far);

  the problem then manifests as unability to run said executable
  as it will fail because of the intermediate library inflicted
  assertion (sadly with very unhelpful "Assertion `0' failed"
  message);

  fortunately, there's enough flexibility so as how to fix
  this, either should be fine:
  . have everything in the executable's library dependency closure
that links against libqb assurably linked with one variant of
libqb only (either all pre-fix or post-fix)
  . have the end executable (that does not use logging at all as
discussed precondition) linked using substitution like this:
s/-lqb/-l:libqb.so.0/  (you may need to adapt the number later)
and you may also need to add this CPPFLAG for the executable:
-DQB_KILL_ATTRIBUTE_SECTION

* * *

Note: QB_LOG_INIT_DATA macro is not that widespread in the client
  space (though pacemaker uses it and corosync did use an internal
  variant that's hopefully ditched in favour of the former:
  https://github.com/corosync/corosync/pull/251) but I would
  recommend using it anywhere the logging is involved as it
  helps to check for preconditions of functional logging
  early at startup of the executable -- hard to predict
  what more breakage is to come from the linker side :-/
  (and on that note, there was an attempt to reconcile
  linker changes in the upstream I had no idea about
  until recently:
  https://github.com/ClusterLabs/libqb/pull/266#issuecomment-337700089
  but only limited subset of the behaviour was restored, which
  doesn't help us with libqb and binutils 2.29.1 still enforces
  us to use the workaround for 2.29 -- on the other hand, no new
  breakage was introduced, so the coexistence remains settled
  as of the fix)

> I believe it was worth the effort.
> 
>>>   fancy linker, it will likely differ from the iterim one above
>>>   (so far, I had quite miserable knowledge of linker

Re: [ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-10-19 Thread Jan Pokorný
On 03/08/17 20:50 +0200, Valentin Vidic wrote:
>> Proper solution:
>> - give me few days to investigate better ways to deal with this

well, that estimate was off... by far :)

But given the goals of
- as high level of isolation of the client space from the linker
  (respectively toolchain) subtleties as possible (no new compilation
  flags an such on that side)
- universality, as you don't really want to instruct libqb users
  to use this set of flags with linker A and this with linker B,
  and there's no way to hook any computational ad-hoc decision
  when the compilation is about to happen (and particular linker
  to be used + it's version are quite obscured in the build pipeline
  so only configure-like checks would be viable, anyway)
- mapping the exact scope of the issue for basic combinations of
  link participants each doing the logging on its own and possibly
  differing in the linker used to build them into standalone shared
  libraries or executable using the former + cranking up the runner
  of the underlying test matrix
- some reasonable assurance that logging is not silently severed (see
  the headaches note below)
I believe it was worth the effort.

>>   fancy linker, it will likely differ from the iterim one above
>>   (so far, I had quite miserable knowledge of linker script and
>>   other internals, getting better but not without headaches);
>>   we should also ensure there's a safety net because realizing
>>   there are logs missing when they are expected the most
>>   ... priceless
> 
> Thank you for the effort.  There is no rush here, we just won't
> be able to upload new version to Debian unstable.

The reconciling patchset is not merged yet, but I'd say it's in the
good shape: https://github.com/ClusterLabs/libqb/pull/266

Testing is requested, of course ;)

-- 
Jan (Poki)


pgpDYI4dAL_gu.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [ClusterLabs] [HA/ClusterLabs Summit] Key-Signing Party, 2017 Edition

2017-09-06 Thread Jan Pokorný
On 24/07/17 16:59 +0200, Jan Pokorný wrote:
> On 23/07/17 12:32 +0100, Adam Spiers wrote:
>> Jan Pokorný  wrote:
>>> So, going to attend summit and want your key signed while reciprocally
>>> spreading the web of trust?
>>> Awesome, let's reuse the steps from the last time:
>>> 
>>> Once you have a key pair (and provided that you are using GnuPG),
>>> please run the following sequence:
>>> 
>>>   # figure out the key ID for the identity to be verified;
>>>   # IDENTITY is either your associated email address/your name
>>>   # if only single key ID matches, specific key otherwise
>>>   # (you can use "gpg -K" to select a desired ID at the "sec" line)
>>>   KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)
>> 
>> AFAICS this has two problems: it's missing a --list-key option,
> 
> Bummer!  I've been checking the original thread(s) for responses from
> others, but forgot to check my own:
> http://lists.linux-ha.org/pipermail/linux-ha/2015-January/048511.html
> 
> Thanks for spotting (and the public key already sent), Adam.
> 
>> and it doesn't handle multiple matches for 'IDENTITY'.  So to make it
>> choose the newest key if there are several:
>> 
>>read IDENTITY
>>KEY=$(gpg --with-colons --list-key "$IDENTITY" | grep '^pub' |
>>  sort -t: -nr -k6 | head -n1 | cut -d: -f5)
> 
> Good point.  Hopefully affected persons, allegedly heavy users of GPG,
> are capable to adapt on-the-fly anyway :-)
> 
>>>  # export the public key to a file that is suitable for exchange
>>>  gpg --export -a -- $KEY > $KEY
>>> 
>>>  # verify that you have an expected data to share
>>>  gpg --with-fingerprint -- $KEY

Thanks to the attendants and I am sorry for not responding to the ones
with on-the-edge submissions -- there was actually an active one
accepted and I've refreshed the authoritative record about the event
at https://people.redhat.com/jpokorny/keysigning/2017-ha/ accordingly
(see '*2.*' suffixes).

I'd also kindly ask the actual attendants (one person skipped the
event) to do the remaining signing work within the month at latest.
You can just grab the key of the other, already verified party from
the linked source (or the well known key server if present), sign it,
and then (IMHO) preferably send the signed key back to the original
person at one of his/her listed email, again (IMHO) preferably in an
encrypted form.  There are various tools to help with this workflow at
scale, such as PIUS (https://github.com/jaymzh/pius) to give an
example, but YMMV.

May the web of trust be with you.

-- 
Jan (Poki)


pgpSmOoCsGdgJ.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-08-03 Thread Jan Pokorný
On 03/08/17 18:40 +0200, Valentin Vidic wrote:
> On Tue, Aug 01, 2017 at 11:07:24PM +0200, Jan Pokorný wrote:
>> https://bugzilla.redhat.com/1477354
> 
> Thanks for the info.  We are seeing similar problems with the
> pacemaker build on Debian now:
> 
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=869986

Yep, the same issue, because of compiling pacemaker with the same
fancy linker that decided to hide those symbols without notices,
I guess.

Can you somehow scrape the number of project in Debian that
suffer from __{start,stop}_ symbols missing?  That might
help to convince binutils maintainers there's something wrong.

> Guess we'll need to fix pacemaker libs to get this fixed?

Iterim non-production-ready solution (e.g. equivalent of Fedora
Rawhide):
- use patch(es) from
  https://github.com/jnpkrn/libqb/commits/workaround-ld-2.29
  above "Doc tweaking (#261)"
- short path: generate and grab "qblog.t", build pacemaker
  (and possibly other dependants) with LDFLAGS=,
- longer proper path:
  . build updated libqb with that
  . rebuild pacemaker on top of that rebuilt libqb
(hopefully it uses pkconfig and hopefuly the patched
libqb.pc will be OK, untested)

Proper solution:
- give me few days to investigate better ways to deal with this
  fancy linker, it will likely differ from the iterim one above
  (so far, I had quite miserable knowledge of linker script and
  other internals, getting better but not without headaches);
  we should also ensure there's a safety net because realizing
  there are logs missing when they are expected the most
  ... priceless

Thanks

-- 
Jan (Poki)


pgpk7ElxDixVU.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-08-01 Thread Jan Pokorný
On 31/07/17 22:26 +0200, Jan Pokorný wrote:
> On 31/07/17 21:55 +0200, Jan Pokorný wrote:
>> This might be of interest *now* if you are fiddling with bleeding
>> edge, or *later* when the distros adopt that version of binutils or
>> newer:  Root cause is currently unknown, but the good news is that
>> the failure will be captured by the test suite.  At least this was
>> the case with the recent mass rebuild in Fedora Rawhide.
>> 
>> Will post more details/clarifications/rectifications when I know more.
> 
> So, after reverting following patches (modulo test suite files that
> can be skipped easily) from 2.29:
> 
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7dba9362c172f1073487536eb137feb2da30b0ff
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=b27685f2016c510d03ac9a64f7b04ce8efcf95c4
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=cbd0eecf261c2447781f8c89b0d955ee66fae7e9
> 
> I got log.test running happily again.  Will try to identify which one
> is to be blamed and follow up with binutils/ld maintainer.

https://bugzilla.redhat.com/1477354

> There's also an obligation on the libqb side to make the configure
> test much more bullet-proof, as having logging silently directed at
> "virtual /dev/null" could be quite painful.  We might go as far
> as refusing to compile when section attribute supported by the
> compiler/GCC but linker being a show stopper -- I suspect the
> performance is the key driver for using that mechanism, so silent
> regression in this area might be undesirable as well.

-- 
Jan (Poki)


pgpfRJNp5TEQx.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-07-31 Thread Jan Pokorný
On 31/07/17 21:55 +0200, Jan Pokorný wrote:
> This might be of interest *now* if you are fiddling with bleeding
> edge, or *later* when the distros adopt that version of binutils or
> newer:  Root cause is currently unknown, but the good news is that
> the failure will be captured by the test suite.  At least this was
> the case with the recent mass rebuild in Fedora Rawhide.
> 
> Will post more details/clarifications/rectifications when I know more.

So, after reverting following patches (modulo test suite files that
can be skipped easily) from 2.29:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7dba9362c172f1073487536eb137feb2da30b0ff
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=b27685f2016c510d03ac9a64f7b04ce8efcf95c4
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=cbd0eecf261c2447781f8c89b0d955ee66fae7e9

I got log.test running happily again.  Will try to identify which one
is to be blamed and follow up with binutils/ld maintainer.

There's also an obligation on the libqb side to make the configure
test much more bullet-proof, as having logging silently directed at
"virtual /dev/null" could be quite painful.  We might go as far
as refusing to compile when section attribute supported by the
compiler/GCC but linker being a show stopper -- I suspect the
performance is the key driver for using that mechanism, so silent
regression in this area might be undesirable as well.

-- 
Jan (Poki)


pgp_67s29sbqY.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] [libqb] heads-up: logging not working with binutils-2.29 standard linker (ld.bfd)

2017-07-31 Thread Jan Pokorný
This might be of interest *now* if you are fiddling with bleeding
edge, or *later* when the distros adopt that version of binutils or
newer:  Root cause is currently unknown, but the good news is that
the failure will be captured by the test suite.  At least this was
the case with the recent mass rebuild in Fedora Rawhide.

Will post more details/clarifications/rectifications when I know more.

-- 
Jan (Poki)


pgpniZ2EzEVGQ.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] bundle/docker: zombie process on resource stop

2017-07-28 Thread Jan Pokorný
On 27/07/17 17:40 -0500, Ken Gaillot wrote:
> On Thu, 2017-07-27 at 23:26 +0200, Jan Pokorný wrote:
>> On 24/07/17 17:59 +0200, Valentin Vidic wrote:
>>> On Mon, Jul 24, 2017 at 09:57:01AM -0500, Ken Gaillot wrote:
>>>> Are you sure you have pacemaker 1.1.17 inside the container as well? The
>>>> pid-1 reaping stuff was added then.
>>> 
>>> Yep, the docker container from the bundle example got an older
>>> version installed, so mystery solved :)
>>> 
>>>   pacemaker-remote-1.1.15-11.el7_3.5.x86_64
>> 
>> As with docker/moby kind of bundles, pacemaker on host knows when it
>> sets pacemaker_remoted as the command to be run within the container
>> or not, it would be possible for it in such case check whether this
>> remote peer is recent enough to cope with zombie reaping and prevent
>> it from running any resources if not.
> 
> Leaving zombies behind is preferable to being unable to use containers
> with an older pacemaker_remoted installed. A common use case of
> containers is to run some legacy application that requires an old OS
> environment. The ideal usage there would be to compile a newer pacemaker
> for it, but many users won't have that option.

I was talking about in-bundle use case (as opposed to generic
pacemaker-remote one) in particular where it might be preferable
to have such sanity check in place as opposed to hard-to-predict
consequences, such as when the resource cannot be stopped due to
interference with zombies (well, there is whole lot of other issues
with this weak grip on processess, such as the resource agents on
host can get seriously confused by the processes running in the
local containers!).

For the particular, specific use case at hand, it might be reasonable
to require pacemaker-remote version that actually got bundle-ready,
IMHO.

>> The catch -- pacemaker on host cannot likely evalute this "recent
>> enough" part of the equation properly as there was no LRMD protocol
>> version bump for 1.1.17.  Correct?  Any other hints it could use?

-- 
Poki


pgpIkjUa3IAuC.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] bundle/docker: zombie process on resource stop

2017-07-27 Thread Jan Pokorný
On 24/07/17 17:59 +0200, Valentin Vidic wrote:
> On Mon, Jul 24, 2017 at 09:57:01AM -0500, Ken Gaillot wrote:
>> Are you sure you have pacemaker 1.1.17 inside the container as well? The
>> pid-1 reaping stuff was added then.
> 
> Yep, the docker container from the bundle example got an older
> version installed, so mystery solved :)
> 
>   pacemaker-remote-1.1.15-11.el7_3.5.x86_64

As with docker/moby kind of bundles, pacemaker on host knows when it
sets pacemaker_remoted as the command to be run within the container
or not, it would be possible for it in such case check whether this
remote peer is recent enough to cope with zombie reaping and prevent
it from running any resources if not.

The catch -- pacemaker on host cannot likely evalute this "recent
enough" part of the equation properly as there was no LRMD protocol
version bump for 1.1.17.  Correct?  Any other hints it could use?

-- 
Poki


pgpNm0NnBp1O0.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] [ClusterLabs] [HA/ClusterLabs Summit] Key-Signing Party, 2017 Edition

2017-07-24 Thread Jan Pokorný
On 23/07/17 12:32 +0100, Adam Spiers wrote:
> Jan Pokorný  wrote:
>> So, going to attend summit and want your key signed while reciprocally
>> spreading the web of trust?
>> Awesome, let's reuse the steps from the last time:
>> 
>> Once you have a key pair (and provided that you are using GnuPG),
>> please run the following sequence:
>> 
>>   # figure out the key ID for the identity to be verified;
>>   # IDENTITY is either your associated email address/your name
>>   # if only single key ID matches, specific key otherwise
>>   # (you can use "gpg -K" to select a desired ID at the "sec" line)
>>   KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)
> 
> AFAICS this has two problems: it's missing a --list-key option,

Bummer!  I've been checking the original thread(s) for responses from
others, but forgot to check my own:
http://lists.linux-ha.org/pipermail/linux-ha/2015-January/048511.html

Thanks for spotting (and the public key already sent), Adam.

> and it doesn't handle multiple matches for 'IDENTITY'.  So to make it
> choose the newest key if there are several:
> 
>read IDENTITY
>KEY=$(gpg --with-colons --list-key "$IDENTITY" | grep '^pub' |
>  sort -t: -nr -k6 | head -n1 | cut -d: -f5)

Good point.  Hopefully affected persons, allegedly heavy users of GPG,
are capable to adapt on-the-fly anyway :-)

>>  # export the public key to a file that is suitable for exchange
>>  gpg --export -a -- $KEY > $KEY
>> 
>>  # verify that you have an expected data to share
>>  gpg --with-fingerprint -- $KEY

-- 
Jan (Poki)


pgpMxrReDwmaM.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] [HA/ClusterLabs Summit] Key-Signing Party, 2017 Edition

2017-07-21 Thread Jan Pokorný
Hello cluster masters :-)

as there's little less than 7 weeks left to "The Summit" meetup
(), it's about time to get the ball
rolling so we can voluntarily augment the digital trust amongst
us the attendees, on OpenGPG basis.

Doing that, we'll actually establish a tradition since this will
be the second time such event is being kicked off (unlike the birds
of the feather gathering itself, was edu-feathered back then):

  
  

If there are no objections, yours truly will conduct this undertaking.
(As an aside, I am toying with an idea of optimizing the process
a bit now that many keys are cross-signed already; I doubt there's
a value of adding identical signatures just with different timestamps,
unless, of course, the inscribed level of trust is going to change,
presumably elevate -- any comments?)

* * *

So, going to attend summit and want your key signed while reciprocally
spreading the web of trust?
Awesome, let's reuse the steps from the last time:

Once you have a key pair (and provided that you are using GnuPG),
please run the following sequence:

# figure out the key ID for the identity to be verified;
# IDENTITY is either your associated email address/your name
# if only single key ID matches, specific key otherwise
# (you can use "gpg -K" to select a desired ID at the "sec" line)
KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)

# export the public key to a file that is suitable for exchange
gpg --export -a -- $KEY > $KEY

# verify that you have an expected data to share
gpg --with-fingerprint -- $KEY

with IDENTITY adjusted as per the instruction above, and send me the
resulting $KEY file, preferably in a signed (or even encrypted[*]) email
from an address associated with that very public key of yours.

Timeline?
Please, send me your public keys *by 2017-09-05*, off-list and
best with [key-2017-ha] prefix in the subject.  I will then compile
a list of the attendees together with their keys and publish it at

so it can be printed beforehand.

[*] You can find my public key at public keyservers:

Indeed, the trust in this key should be ephemeral/one-off
(e.g. using a temporary keyring, not a universal one before we
proceed with the signing :)

* * *

Thanks for your cooperation, looking forward to this side stage
(but nonetheless important if release or commit[1] signing is to get
traction) happening and hope this will be beneficial to all involved.

See you there!


[1] for instance, see:



-- 
Jan (Poki)


pgpAflvBotm3a.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] Coming in pacemaker: CIB schema versioning no longer compatible with string- and float-based comparisons

2017-07-20 Thread Jan Pokorný
Originally, pacemaker was treating versions of its CIB schemas
(respectively named files) as floats, simplifying comparisons in
a simplistic single-digit minor version realms.  As the necessity
to roll out a new schema version catering the new needs is around
the corner, it was decided[1] that pacemaker will not bump the
major part abrubtly (against the rules[2]), but rather the overflow
to a new higher digit in the minor part will occur.  That breaks
assumption of easy comparisons in two previously common angles:

- string (e.g. when looking at /cib@validate-with XPath within CIB
  or at the available, respectively named schema files);
  example: pacemaker-2.10 < pacemaker-2.2

- mentioned float (when parsing the numerical part out of the string):
  example: 2.11 < 2.2

Apparently, what can intuitively be labelled "version sort" remains
compatible, but as it is not so convenient implementation-wise,
it may be missing at various relevant places.

The takeaway is:
If you maintain some pieces of software that deal with schema versions
or files in any way, you are advised to double-check if you are ready
for this change.

For instance, it wasn't exactly the case of clufter:
https://pagure.io/clufter/c/f30d6eea9aa668696717b81ccc18ede724eb?branch=next


[1] https://github.com/ClusterLabs/pacemaker/pull/1308
[2] https://github.com/ClusterLabs/pacemaker/blob/master/xml/Readme.md

-- 
Jan (Poki)


pgpolxeLRyM0w.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] bundle/rkt: port-mapping numbers/names

2017-07-19 Thread Jan Pokorný
On 19/07/17 09:49 -0500, Ken Gaillot wrote:
> On 07/19/2017 01:20 AM, Valentin Vidic wrote:
>> Another issue with the rkt containers is the port-mapping.  Each container
>> defines exposed ports:
>> 
>>  "ports": [
>>  {
>>  "name": "http",
>>  "protocol": "tcp",
>>  "port": 80,
>>  "count": 1,
>>  "socketActivated": false
>>  },
>>  ]
>> 
>> These are than mapped using the "name" from the definition:
>> 
>>   --port=   ports to expose on the host (requires 
>> contained network). Syntax: --port=NAME:[HOSTIP:]HOSTPORT
>> 
>> The problem now is that the xml defines the port to be a number:
>> 
>>   
>> 
>> Workaround is to use "80" as a port name, but perhaps we could allow
>> port to be a string or introduce a new attribute:
>> 
>>   
>> 
>> What do you think?
> 
> Hmm, this was a questionable design choice on our part. There was some
> question as to what to include in the docker tag (and thus could be
> different under different container technologies) and what to put
> outside of it (and thus should be supported by all technologies).
> 
> I'm guessing the situation is that your code needs to do something about
> the port mapping (otherwise you could just omit port-mapping with rkt),
> and the rkt "ports" configuration is pre-existing (otherwise your code
> could generate it with an arbitrary name).
> 
> I would think this would also affect the control-port attribute.
> 
> I see these alternatives, from simplest to most complicated:
> 
> * Just document the issue and require rkt configurations to have name
> equal to port number.

I don't think that alone would suffice, I'd expect at least (port,transport)
pair to be reasonably unique as long as you can remap TCP/UDP independently
(I am not sure, but would be no surprise) -- but hey, we have just hit
another limitation of the current schema (transport layer not being
taken into account -- is TCP silently assumed, then?).

> * Is it possible for the code to take the port number from port-mapping
> and query the rkt configuration to find the appropriate name?
> 
> * Is it possible for the code to generate a duplicate/override "ports"
> configuration with a generated name?
> 
> * Relax the port attribute to  and let the container
> implementation validate it further as needed. A downside is that some
> Docker config errors wouldn't be caught in the schema validation phase.
> (I think I prefer this over a separate port-name attribute.)
> 
> * Restructure the RNG so that the choice is between
>  and
> . It would be ugly and
> involve some duplication, but it would satisfy both implementations.

Similar approach was discussed with another proposed change:
http://oss.clusterlabs.org/pipermail/users/2017-April/005552.html
(item 1., i.e., separating the pacemaker-level pseudogenerics from
the tag for a particular engine) which still might be appealing,
especially as/if the schema gets changed anyway.

Valentin, is rkt able so serve containers from one image/location
in multiple instances in parallel?

> * Modify the schema so  is enclosed within the technology tag,
> and provide an XSL transform for existing configurations.
> 
> The last two options have the advantage of letting us move the 
> "network" attribute to the  tag.

-- 
Jan (Poki)


pgpFCzmKCQDpS.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] ocf_take_lock is NOT actually safe to use

2017-06-22 Thread Jan Pokorný
On 21/06/17 16:40 +0200, Lars Ellenberg wrote:
> Repost to a wider audience, to raise awareness for this.

Appreciated, Lars.
Adding developers ML for possibly even larger outreach.

> ocf_take_lock may or may not be better than nothing.
> 
> It at least "annotates" that the auther would like to protect something
> that is considered a "critical region" of the resource agent.
> 
> At the same time, it does NOT deliver what the name seems to imply.
> 
> I think I brought this up a few times over the years, but was not noisy
> enough about it, because it seemed not important enough: no-one was
> actually using this anyways.

True, I have found this reference (a leaf in the whole thread):
http://lists.linux-ha.org/pipermail/linux-ha-dev/2010-October/017801.html

> But since new usage has been recently added with
> [ClusterLabs/resource-agents] targetcli lockfile (#917)

[linked: https://github.com/ClusterLabs/resource-agents/pull/917]

> here goes:
> 
> On Wed, Jun 07, 2017 at 02:49:41PM -0700, Dejan Muhamedagic wrote:
>> On Wed, Jun 07, 2017 at 05:52:33AM -0700, Lars Ellenberg wrote:
>>> Note: ocf_take_lock is NOT actually safe to use.
>>> 
>>> As implemented, it uses "echo $pid > lockfile" to create the lockfile,
>>> which means if several such "ocf_take_lock" happen at the same time,
>>> they all "succeed", only the last one will be the "visible" one to future 
>>> waiters.
>> 
>> Ugh.
> 
> Exactly.
> 
> Reproducer:
> #
> #!/bin/bash
> export OCF_ROOT=/usr/lib/ocf/ ;
> .  /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ;
> 
> x() (
>   ocf_take_lock dummy-lock ;
>   ocf_release_lock_on_exit dummy-lock  ;
>   set -C;
>   echo x > protected && sleep 0.15 && rm -f protected || touch BROKEN;
> );
> 
> mkdir -p /run/ocf_take_lock_demo
> cd /run/ocf_take_lock_demo
> rm -f BROKEN; i=0;
> time while ! test -e BROKEN; do
>   x &  x &
>   wait;
>   i=$(( i+1 ));
> done ;
> test -e BROKEN && echo "reproduced race in $i iterations"
> #
> 
> x() above takes, and, because of the () subshell and
> ocf_release_lock_on_exit, releases the "dummy-lock",
> and within the protected region of code,
> creates and removes a file "protected".
> 
> If ocf_take_lock was good, there could never be two instances
> inside the lock, so echo x > protected should never fail.
> 
> With the current implementation of ocf_take_lock,
> it takes "just a few" iterations here to reproduce the race.
> (usually within a minute).
> 
> The races I see in ocf_take_lock:
> "creation race":
>   test -e $lock
>   # someone else may create it here
>   echo $$ > $lock
>   # but we override it with ours anyways
> 
> "still empty race":
>   test -e $lock   # maybe it already exists (open O_CREAT|O_TRUNC)
>   # but does not yet contain target pid,
>   pid=`cat $lock` # this one is empty,
>   kill -0 $pid# and this one fails
>   and thus a "just being created" one is considered stale
> 
> There are other problems around "stale pid file detection",
> but let's not go into that minefield right now.
> 
>>> Maybe we should change it to 
>>> ```
>>> while ! ( set -C; echo $pid > lockfile ); do
>>> if test -e lockfile ; then
>>> : error handling for existing lockfile, stale lockfile detection
>>> else
>>> : error handling for not being able to create lockfile
>>> fi
>>> done
>>> : only reached if lockfile was successfully created
>>> ```
>>> 
>>> (or use flock or other tools designed for that purpose)
>> 
>> flock would probably be the easiest. mkdir would do too, but for
>> upgrade issues.
> 
> and, being part of util-linux, flock should be available "everywhere".
> 
> but because writing "wrappers" around flock similar to the intended
> semantics of ocf_take_lock and ocf_release_lock_on_exit is not easy
> either, usually you'd be better of using flock directly in the RA.
> 
> so, still trying to do this with shell:
> 
> "set -C" (respectively set -o noclober):
>   If set, disallow existing regular files to be overwritten
>   by redirection of output.

For completeness, also guaranteed with POSIX specification:
http://pubs.opengroup.org/onlinepubs/009695399/utilities/set.html

> normal '>' means: O_WRONLY|O_CREAT|O_TRUNC,

From
https://github.com/ClusterLabs/resource-agents/pull/622#issuecomment-113166800
I actually got an impression that this is shell-specific.

> set -C '>' means: O_WRONLY|O_CREAT|O_EXCL

The only thing I can add at this point (it needs more time to read up
on the proposals) is that this is another con for using "standard"
shell as an implementation language, along with, e.g., being prone
to mishandle whitespaces in the parameters being passed easily:
http://oss.clusterlabs.org/pipermail/users/2015-May/000403.html

> using "set -C ; echo $$ > $lock" instead of 
> "test -e $lock || echo $$ > $lock"
> g

Re: [ClusterLabs Developers] checking all procs on system enough during stop action?

2017-04-24 Thread Jan Pokorný
On 24/04/17 17:32 +0200, Jehan-Guillaume de Rorthais wrote:
> On Mon, 24 Apr 2017 17:08:15 +0200
> Lars Ellenberg  wrote:
> 
>> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote:
>>> Hi all,
>>> 
>>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent
>>> negative feedback we got is how difficult it is to experience with it
>>> because of fencing occurring way too frequently. I am currently hunting
>>> this kind of useless fencing to make life easier.
>>> 
>>> It occurs to me, a frequent reason of fencing is because during the stop
>>> action, we check the status of the PostgreSQL instance using our monitor
>>> function before trying to stop the resource. If the function does not return
>>> OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error,
>>> leading to a fencing. See:
>>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301
>>> 
>>> I am considering adding a check to define if the instance is stopped even
>>> if the monitor action returns an error. The idea would be to parse **all**
>>> the local processes looking for at least one pair of
>>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to
>>> stop. If none are found, we consider the instance is not running.
>>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS.
>>> 
>>> Just for completeness, the piece of code would be:
>>> 
>>>my @pids;
>>>foreach my $f (glob "/proc/[0-9]*") {
>>>push @pids => basename($f)
>>>if -r $f
>>>and basename( readlink( "$f/exe" ) ) eq "postgres"
>>>and readlink( "$f/cwd" ) eq $pgdata;
>>>}
>>> 
>>> I feels safe enough to me.
> 
> [...]
> 
> But anyway, here or there, I would have to add this piece of code looking at
> each processes. According to you, is it safe enough? Do you see some hazard
> with it?

Just for the sake of completeness, there's a race condition, indeed,
in multiple repeated path traversals (without being fixed of particular
entry inode), which can be interleaved with new postgres process being
launched anew (or what not).  But that may happen even before the code
in question is executed -- naturally not having a firm grip on the
process is open to such possible issues, so this is just an aside.

-- 
Jan (Poki)


pgpwmC7RyNunW.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] FenceAgentAPI

2017-03-07 Thread Jan Pokorný
On 06/03/17 17:12 -0500, Digimer wrote:
>   The old FenceAgentAPI document on fedorahosted is gone now that fedora
> hosted is closed. So I created a copy on the clusterlabs wiki:
> 
> http://wiki.clusterlabs.org/wiki/FenceAgentAPI

Note that just few days ago I've announced that the page has moved to
https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md, see
http://oss.clusterlabs.org/pipermail/developers/2017-February/000438.html
(that hit just the developers list, I don't think it's of interest of
users of the stack as such).  Therefore that's another duplicate, just
as http://wiki.clusterlabs.org/wiki/Fedorahosted.org_FenceAgentAPI
(linked from the original fedorahosted.org page so as to allow for
future flexibility should the content still be visible, which turned
out to not be the case) is.

I will add you (or whoever wants to maintain that file) to linux-cluster
group at pagure.io so you can edit the underlying Markdown file (just let
me off-list know your Fedora Account System username).  The file itself 
is tracked under git repository, access URLs were provided in the
announcement email.

>   It desperately needs an update. Specifically, it needs '-o metadata'
> properly explained. I am happy to update this document and change the
> cman/cluster.conf example over to a pacemaker example, etc, but I do not
> feel like I am authoritative on the XML validation side of things.
> 
>   Can someone give me, even just point-form notes, how to explain this?
> If so, I'll create 'FenceAgentAPI - Working' document and I will have
> anyone interested comment before making it an official update.
> 
> Comments?

-- 
Jan (Poki)


pgpR6PrKqVgj_.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] Pagure.io as legacy codebases/distribution files/documentation hosting (Was: Moving cluster project)

2017-02-28 Thread Jan Pokorný
On 28/02/17 03:18 +0100, Jan Pokorný wrote:
> On 17/01/17 22:27 +0100, Jan Pokorný wrote:
>> On 17/01/17 21:14 +, Andrew Price wrote:
>>> On 17/01/17 19:58, Jan Pokorný wrote:
>>>> So I think we should arrange for a move to pagure.io for this cluster
>>>> project as well if possible, if only to retain the ability to change
>>>> something should there be a need.
>>> 
>>> Good plan.
>>> 
>>>> I can pursuit this if there are no complaints.  Just let me know
>>>> (off-list) who aspires to cluster-maint group (to be created)
>>>> membership.
>>> 
>>> Could you give the gfs2-utils-maint group push access to the cluster project
>>> once it's been set up? (It is possible to add many groups to a project.) I
>>> think that would be the most logical way to do it.
>> 
>> Sure and thanks for a cumulative access assignment tip.
>> 
>> I'll proceed on Friday or early next week, then.
> 
> Well, scheduler of mine didn't get to it until now, so sorry
> to anyone starting to worry.
> 
> So what's been done:
> 
> - git repo moved over to https://pagure.io/linux-cluster/cluster
>   + granted commit rights for gfs2-utils-maint group
> (and will add some more folks to linux-cluster group,
> feel free to bug me off-list about that)
>   + mass-committed an explanation change to every branch at
> the discontinued fedorahosted.org (fh.o) provider I could,
> as some are already frozen
> (https://git.fedorahosted.org/cgit/cluster.git/)
>   . I've decided to use a namespace (because there are possibly
> more projects to be migrated under that label),

Actually, there are quite a few legacy project copied over, some
merely for plain archival bits-preserving:
https://pagure.io/group/linux-cluster
[did I miss anything?  AFAIK, gfs-utils and dlm components migrated
on their own, and corosync is on GitHub for years]

Actually also some components otherwise found under ClusterLabs label
(note that *-agents are common to both worlds) are affected, and for
that I created a separate ClusterLabs group on pagure.io:
https://pagure.io/group/ClusterLabs

The respective projects there are just envelopes that I used for
uploading distribution files and/or documentation that were so
far served by fedorahosted.org [*], not used for active code
hosting (at this time, anyway).

[*] locations like:
https://fedorahosted.org/releases/q/u/quarterback/
https://fedorahosted.org/releases/f/e/fence-agents/

> and have stuck with linux-cluster referring to the mailing list
> of the same name that once actively served to discuss the
> cluster stack in question (and is quite abandoned nowadays)
> 
> - quickly added backup location links at
>   https://fedorahosted.org/cluster/ and
>   https://fedorahosted.org/cluster/wiki/FenceAgentAPI,

I've converted the latter to Markdown and exposed at
https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md
The maintenance or just source access should be as simple as
cloning from ssh://g...@pagure.io/docs/ClusterLabs/fence-agents.git
or https://pagure.io/docs/ClusterLabs/fence-agents.git, respectively.

>   i.e., the pages that seem most important to me, to allow for
>   smooth "forward compatibility"; the links currently refer to vain
>   stubs at ClusterLabs wiki, but that can be solved later on -- I am
>   still unsure if trac wikis at fh.o will be served in the next
>   phase or shut down right away and apparently this measure will
>   help only in the former case
> 
> What to do:
> - move releases over to pagure.io as well:
>   https://fedorahosted.org/releases/c/l/cluster/

Done for cluster:
http://releases.pagure.org/linux-cluster/cluster/

Tarballs for split components from here will eventually be uploaded
to respective release directories for the particular projects, e.g.,
http://releases.pagure.org/ClusterLabs/fence-agents/, it's a WIP.

> - possibly migrate some original wiki content to proper
>   "doc pages" exposed directly through pagure.io

So far I am just collecting the cluster wiki texts for possible
later ressurecting.

> - resolve the question of the linked wiki stubs and
>   cross-linking as such
> 
> Any comments?  Ideas?

-- 
Jan (Poki)


pgp6lzFuvGOAh.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Moving cluster project (Was: Moving gfs2-utils away from fedorahosted.org)

2017-02-27 Thread Jan Pokorný
On 17/01/17 22:27 +0100, Jan Pokorný wrote:
> On 17/01/17 21:14 +, Andrew Price wrote:
>> On 17/01/17 19:58, Jan Pokorný wrote:
>>> So I think we should arrange for a move to pagure.io for this cluster
>>> project as well if possible, if only to retain the ability to change
>>> something should there be a need.
>> 
>> Good plan.
>> 
>>> I can pursuit this if there are no complaints.  Just let me know
>>> (off-list) who aspires to cluster-maint group (to be created)
>>> membership.
>> 
>> Could you give the gfs2-utils-maint group push access to the cluster project
>> once it's been set up? (It is possible to add many groups to a project.) I
>> think that would be the most logical way to do it.
> 
> Sure and thanks for a cumulative access assignment tip.
> 
> I'll proceed on Friday or early next week, then.

Well, scheduler of mine didn't get to it until now, so sorry
to anyone starting to worry.

So what's been done:

- git repo moved over to https://pagure.io/linux-cluster/cluster
  + granted commit rights for gfs2-utils-maint group
(and will add some more folks to linux-cluster group,
feel free to bug me off-list about that)
  + mass-committed an explanation change to every branch at
the discontinued fedorahosted.org (fh.o) provider I could,
as some are already frozen
(https://git.fedorahosted.org/cgit/cluster.git/)
  . I've decided to use a namespace (because there are possibly
more projects to be migrated under that label), and have
stuck with linux-cluster referring to the mailing list of
the same name that once actively served to discuss the
cluster stack in question (and is quite abandoned nowadays)

- quickly added backup location links at
  https://fedorahosted.org/cluster/ and
  https://fedorahosted.org/cluster/wiki/FenceAgentAPI, i.e.,
  the pages that seem most important to me, to allow for
  smooth "forward compatibility"; the links currently refer
  to vain stubs at ClusterLabs wiki, but that can be solved
  later on -- I am still unsure if trac wikis at fh.o will
  be served in the next phase or shut down right away and
  apparently this measure will help only in the former case

What to do:
- move releases over to pagure.io as well:
  https://fedorahosted.org/releases/c/l/cluster/
- possibly migrate some original wiki content to proper
  "doc pages" exposed directly through pagure.io
- resolve the question of the linked wiki stubs and
  cross-linking as such

Any comments?  Ideas?

-- 
Jan (Poki)


pgpHvoCAcoBIM.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] Moving cluster project (Was: Moving gfs2-utils away from fedorahosted.org)

2017-01-17 Thread Jan Pokorný
On 17/01/17 21:14 +, Andrew Price wrote:
> On 17/01/17 19:58, Jan Pokorný wrote:
>> So I think we should arrange for a move to pagure.io for this cluster
>> project as well if possible, if only to retain the ability to change
>> something should there be a need.
> 
> Good plan.
> 
>> I can pursuit this if there are no complaints.  Just let me know
>> (off-list) who aspires to cluster-maint group (to be created)
>> membership.
> 
> Could you give the gfs2-utils-maint group push access to the cluster project
> once it's been set up? (It is possible to add many groups to a project.) I
> think that would be the most logical way to do it.

Sure and thanks for a cumulative access assignment tip.

I'll proceed on Friday or early next week, then.

-- 
Jan (Poki)


pgpq79t5ViFqe.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Moving gfs2-utils away from fedorahosted.org

2017-01-17 Thread Jan Pokorný
[adding developers list at clusterlabs to CC]

On 16/01/17 18:45 +, Andrew Price wrote:
> On 19/09/16 17:48, Andrew Price wrote:
>> Re: https://communityblog.fedoraproject.org/fedorahosted-sunset-2017-02-28/
>> 
>> We'll need to find a new host for the cluster projects that haven't
>> migrated away from fedorahosted.org yet.
>> 
>> The recommended successor to fedorahosted.org is pagure.io which is a
>> Fedora project, open source, uses the same user account system, allows
>> git hooks to be set up, and has the added advantage that we have a
>> direct line to the admins and developers.
>> 
>> [...]
> 
> Progress on this:
> 
> - A new repository has been created at  and
> everything in the gfs2-utils Fedora Hosted repository has been pushed to it.
> This will be kept mirrored until the switch over.
> 
> - A gfs2-utils maintainers group 
> has been set up and given push access to the repository.
> 
> - Filed a ticket  to get
> the release tarballs etc. migrated over (and hopefully a URL redirect set
> up).
> 
> - Disabled the issue tracker and pull request features for the project as we
> currently have no plans to move away from Bugzilla and email.

Thanks for setting an example on this matter, Andy.

Tangentially touching is a question of "cluster" project at
fedorahosted location incl. still occasionally evolving or at least
valuable material, perhaps subject of future changes -- git tree
itself and the wiki.

For the former, there are still active branches:
- RHEL6 (head currently featuring Andy's recent commit):
  https://git.fedorahosted.org/cgit/cluster.git/commit/?h=RHEL6
- STABLE32 (Chrissie's commit from around the same time as above)
  https://git.fedorahosted.org/cgit/cluster.git/commit/?h=STABLE32

For the latter, there are some pretty authoritative documents, such
as definition of the API that fence agents should provide:
https://fedorahosted.org/cluster/wiki/FenceAgentAPI

So I think we should arrange for a move to pagure.io for this cluster
project as well if possible, if only to retain the ability to change
something should there be a need.

I can pursuit this if there are no complaints.  Just let me know
(off-list) who aspires to cluster-maint group (to be created)
membership.

-- 
Jan (Poki)


pgpeY6ii7kncL.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Help! Can packmaker launch resource from new network namespace automatically

2016-12-22 Thread Jan Pokorný
[forwarding to users list as it seems a better audience to me]

On 22/12/16 05:08 +0800, Hao QingFeng wrote:
> I am newbie for pacemaker and using it to manage resource haproxy on ubuntu
> 16.04.
> 
> I met a problem that haproxy can't start listening for some services
> in vip because the related ports were occupied by some native
> services which listened on 0.0.0.0.
> 
> So I would like just  to confirm that if pacemaker can create a new
> network namespace for haproxy(or other manged resource)
> automatically to avoid such socket binding conflict?

No, pacemaker does not have that ability per se and I don't expect it
will ever go in systemd direction (i.e. piece of software that is so
tailored to particular OS since some particular version because of
depending on recent kernel features that it cannot be run elsewhere,
as opposed to portability across various more or less POSIX compliant
systems).

However, that does not mean that you cannot achieve such extra
behavior at all -- quite the opposite as shell scripting in resource
agents, where the core business logic for particular resource happens
to be outsourced, allows you to do whatever available through command
line tools.  And for your goal, there indeed are tools that may come
useful, see ip-netns(8) and nsenter(1) from iproute and util-linux
packages, respectively.

> If yes, how to configure it? If no, do you have any advice on how to
> solve the problem?

See above.

Still, I would start with checking that haproxy or the conflicting
services indeed cannot be instructed which local addresses (not) to
listen at before rolling out anything as complex as per-resources
namespaces.  Alternatively, there's a PrivateNetwork directive
that can be used in systemd unit file of haproxy, and let pacemaker
start it through systemd.

-- 
Jan (Poki)


pgp1yvJra6f62.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


[ClusterLabs Developers] @ClusterLabs/devel COPR with new libqb (Was: [ClusterLabs] libqb 1.0.1 release)

2016-11-24 Thread Jan Pokorný
On 24/11/16 10:42 +, Christine Caulfield wrote:
> I am very pleased to announce the 1.0.1 release of libqb

For instant tryout on Fedora/EL-based distros, there is already
a habitual COPR build.  But this time around, I'd like to introduce
some advancements in the process...

* * *

First, we now have a dedicated ClusterLabs group established in COPR,
and so far the only, devel, repository underneath, see:

https://copr.fedorainfracloud.org/coprs/g/ClusterLabs/devel/

The page hopefully states clearly what to expect, it's by no mean
intended to eclipse fine tuned downstream packages[*].  The packages
are provided AS ARE and the distros themselves have no liabilities,
so please do not file bugs at downstream trackers -- any feedback
at upstream level is still appreciated (as detailed), though.

[*] that being said, Fedora is receiving an update soonish

* * *

Second, new packages are generated once new push of changesets
occurs at respective upstream repositories, so it's always at
one's discretion whether to pick particular tagged version of
the component, or whichever else (usually the newest one).

So to update strictly to 1.0.1 version of libqb from here and
supposing you have dnf available + your distro is directly covered
with the builds, you would have to do as root:

  # dnf copr enable @ClusterLabs/devel
  # dnf update libqb-1.0.1-1$(rpm -E %dist)

as mere "dnf update libqb" would currently update even higher,
up to 1.0.1-1.2.d03b7 (2 commits pass the 1.0.1 version)
as of writing this email.

In other words, not specifying the particular version will provide
you with the latest greatest version, which is only useful if you
want to push living on the bleeding edge to the extreme (and this
COPR setup is hence a means of "continuous delivery" to shout a first
buzzword here).  It's good to be aware of this.

* * *

[now especially for developers ML readers]

Third, the coverage of the ClusterLabs-associated packages is
going to grow.  So far, there's pacemaker in the pipeline[**].
There's also an immediate benefit for developers of these packages,
as the cross-dependencies are primarily satisfied within the same
COPR repository, which means that here, latest development version
of pacemaker will get built against the latest version of libqb at
that moment, and thanks to the pacemaker's unit tests (as hook in
%check scriptlet when building the RPM package), there's also
realy a notion of integration testing (finally a "continous
integration" in a proper sense, IMHO; the other term to mention here).

That being said, if you work on a fellow project and want it to join
this club (and you are not a priori against Fedora affiliation as that
requires you obtaining an account in Fedora Account System), please
contact me off-list and we'll work it out.

[**] https://github.com/ClusterLabs/pacemaker/pull/1182

* * *

Hope you'll find this useful.

-- 
Jan (Poki)


pgpxt0sIRFYbk.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-11-03 Thread Jan Pokorný
On 03/11/16 19:37 +, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 10/21/2016 07:40 PM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
 On 09/26/2016 09:15 AM, Adam Spiers wrote:
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.  That would have several advantages:
> 
> 1. Get rid of grotesque layering violations and maintenance boundaries
>where the OCF RA duplicates knowledge of all kinds of things which
>are distribution-specific, e.g.:
> 
>  
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
 
 A simplified agent will likely still need distro-specific intelligence
 to do even a limited subset of actions, so I'm not sure there's a gain
 there.
>>> 
>>> What distro-specific intelligence would it need?  If the OCF RA was
>>> only responsible for monitoring, it wouldn't need to know a lot of the
>>> things which are only required for starting / stopping the service and
>>> checking whether it's running, e.g.:
>>> 
>>>   - Name of the daemon executable
>>>   - uid/gid it should be started as
>>>   - Daemon CLI arguments
>>>   - Location of pid file
>>> 
>>> In contrast, an OCF RA only responsible for monitoring would only need
>>> to know how to talk to the service, which is not typically
>>> distro-specific; in the REST API case, it only needs to know the endpoint
>>> URL, which would be configured via Pacemaker resource parameters anyway.
>> 
>> If you're only talking about monitors, that does simplify things. As you
>> mention, you'd still need to configure resource parameters that would
>> only be relevant to the enhanced monitor action -- parameters that other
>> actions might also need, and get elsewhere, so there's the minor admin
>> complication of setting the same value in multiple places.
> 
> Which same value(s)?
> 
> In the OpenStack case (which is the only use case I have), I don't
> think this will happen, because the "monitor" action only needs to
> know the endpoint URL and associated credentials, which doesn't
> overlap with what the other actions need to know.  This separation of
> concerns feels right to me: the start/stop/status actions are
> responsible for managing the state of the service, and the monitor
> action is responsible for monitoring whether it's delivering what it
> should be.  It's just like the separation between admins and end
> users.

This is a side topic: if I am not mistaken, so far no resource agents
(at least those to found under ClusterLabs GitHub entity) could have
sensitive data like passwords specified.  There was no reason.
Now, it sounds this could change.  FAs generally allow sensitive data
to be obtained also by external scripts.  If this design principle was
to be followed, perhaps it would make sense to consider some kind of
"value-or-getter" provision in future OCF revisions.

> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>to systemd, thereby increasing readability and reducing maintenance
>burden.
> 
> 3. OCF RAs are more likely to work out of the box with any distro,
>or at least require less work to get working.
> 
> 4. Services behave more similarly regardless of whether managed by
>Pacemaker or the standard pid 1 service manager.  For example, they
>will always use the same pidfile, run as the same user, in the
>right cgroup, be invoked with the same arguments etc.
> 
> 5. Pacemaker can still monitor services accurately at the
>application-level, rather than just relying on naive pid-level
>monitoring.
> 
> Or is this a terrible idea? ;-)
 
 I considered this, too. I don't think it's a terrible idea, but it does
 pose its own questions.
 
 * What hybrid actions should be allowed? It seems dangerous to allow
 starting from one code base and stopping from another, or vice versa,
 and really dangerous to allow something like migrate_to/migrate_from to
 be reimplemented. At one extreme, we allow anything and leave that
 responsibility on the user; at the other, we only allow higher-level
 monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
>>> 
>>> Just monitors would be good enough for me.
>> 
>> The tomcat RA (which could also benefit from something like this) would
>> extend start and stop as well, e.g. start = systemctl start plus some
>> bookkeeping.
> 
> Ahh OK, interesting.  What kind of bookkeeping?

I think he means anything on top of plain start.  Value added with
resource agents is usually parameterization of at least basics like
some configuratio

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Jan Pokorný
On 26/09/16 11:39 -0500, Ken Gaillot wrote:
> On 09/26/2016 09:10 AM, Adam Spiers wrote:
>> Now, here I *do* see a potential problem.  If service B is managed by
>> Pacemaker, is configured with Requires=A and After=A, but service A is
>> *not* managed by Pacemaker, we would need to ensure that on system
>> shutdown, systemd would shutdown Pacemaker (and hence B) *before* it
>> (systemd) shuts down A, otherwise A could be stopped before B,
>> effectively pulling the rug from underneath B's feet.
>> 
>> But isn't that an issue even if Pacemaker only uses systemd resources?
>> I don't see how the currently used override files protect against this
>> issue.  Have I just "discovered" a bug, or more likely, is there again
>> a gap in my understanding?
> 
> Systemd handles the dependencies properly here:
> 
> - A must be stopped after B (B's After=A)
> - B must be stopped after pacemaker (B's Before=pacemaker via override)
> - therefore, stop pacemaker, then A (which will be a no-op because
>   pacemaker will already have stopped it), then B

without reading too much about systemd behavior here, shouldn't this be:

- therefore, stop pacemaker, then B (which will be a no-op because
  pacemaker will already have stopped it), then A

(i.e., A and B swapped)?

-- 
Jan (Poki)


pgpImw3LZliTj.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Jan Pokorný
On 26/09/16 15:15 +0100, Adam Spiers wrote:
> [snipped]
> 
> To clarify: I am not religiously defending this "wrapper OCF RA" idea
> of mine to the death.  It certainly sounds like it's not as clean as I
> originally thought.  But I'm still struggling to see any dealbreaker.
> 
> OTOH, I'm totally open to better ideas.
> 
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.

Yes, I totally forgot about "monitor" action in the original post. 
It would also likely be usually implemented by the mentioned
"systemd+hooks" class, just as the mentioned "pre-start" and
"post-stop" equivalents (note that behavior of standard OCF agents
could be split so that, say, "start" action is "pre-start" action plus
daemon executable invocation, which would make the parts of behavior
more reusable, e.g., as systemd hooks, than it's the case nowadays).

-- 
Jan (Poki)


pgpkxQUOI7KCu.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Resurrecting OCF

2016-09-22 Thread Jan Pokorný
On 21/09/16 16:26 -0500, Ken Gaillot wrote:
> On 09/21/2016 10:55 AM, Jan Pokorný wrote:
>> On 21/09/16 14:50 +1000, Andrew Beekhof wrote:
>>> I like where this is going.
>>> Although I don’t think we want to get into the business of trying to
>>> script config changes from one agent to another, so I’d drop #4
>> 
>> Not agent parameter changes, just its specification -- to reflect
>> formally what the proposed symlink-based delegation scheme does when
>> the old one is still in use.  If the old and new are incompatible,
>> such automatic delegation is not possible anyway (that's one of
>> the reasons "description" would come handy).
>> 
>> I see there's much bigger potential (parameter renames, ...) but for
>> that, each agent should be responsible on its own (somehow, subject
>> of further evolution).
>> 
>> Also, supposing there are more consumers of RA, the suggestion to
>> run the script should be more generic ("when used from under
>> pacemaker, ...").
>> 
>>> I would make .deprecated a nested directory so that if we want to
>>> retire (for example) a ClusterLabs agent in the future we can create
>>> .deprecate/clusterlabs/ and put the agent there. Rather than make
>>> this heartbeat specific.
>> 
>> Good point; it would also prevent clashes when single directory should
>> serve all the providers.
> 
> I don't understand the desire to treat "deprecated" agents any
> differently. It should be sufficient to just mention it in their help
> text / man page / meta-data / other documentation. Pacemaker isn't going
> to run a "deprecated" agent any differently.

And I don't understand how you came into the conclusion there's
anything changed from the outer view, beside occasional note about
deprecation being emitted to the logs.

It'd be an implementation detail self-contained in resource-agents.

> When users see ocf:whatever:whatever, they know where to look for the
> script. Why frustrate them by making them waste time figuring out how a
> "nonexistent" RA is being used and finding it?

Symlink makes a clear connection and beside, I proposed "new-alias"
action.  I think you overestimate how often the agents are physically
investigated (I guess the project would have more committers if it
was the case).

> If the goal is to let users know that an agent is deprecated (which is
> the only reason that I can think of), then we can add an attribute in
> the meta-data, and UIs/pacemaker can report/log it if present.
> 
>name="Evmsd"
> deprecated="No longer actively maintained"
>   >
> 
>>> I wonder if some of this should live in pacemaker itself though…
>> 
>> This runs directly to the other side of the RA-pacemaker bias,
>> pacemaker caring about RA evolutionary internals :-)
>> 
>> In the outlook, that would make any separated OCF standard efforts
>> worthless and we could just call it pacemaker resource standard
>> right away and forget about any sort of self-containment
>> (the proposed procedure aims to align with).
>> 
>> I am not sure that would be the best thing.
> 
> Agreed, anything we come up with should be explicit in the OCF standard.
> But I think this behavior could be specified in the standard.

As the standard provides guarantees for outer interfacing, there's no
utter need to externalize otherwise self-contained subtleties, in this
case beyond saying that symlinks to __formatted__ files should be
excluded from agent lists (might be overridden on demand).

>>> If resources_action_create() cannot find ocf:${provider}:${agent} in
>>> its usual location, look up
>>> ${OCF_ROOT_DIR}/.compat/${provider}/__entries__
>>> 
>>> Format for __entries__:
>>># old, replacement
>>># ${agent} , ${new_provider}:${new_agent} , ${description}
>>>IPaddr , clusterlabs:IP , Replaced with different semantics
>>>IPaddr2 , clusterlabs:IP , Moved
>>>drbd , linbit:drbd , Moved
>>>eDirectory , , Deleted
>> 
>> Additional "what happened" field might work well in the update
>> suggestions.
>> 
>>> Assuming an entry is found:
>>> - If  . compat/${old_provider}/${old_agent} exists, notify the user
>>>“somehow”, then call it.
>>> - Otherwise, return OCF_ERR_NOT_INSTALLED and use ${description} and
>>>   ${replacement} as the exit reason (which shows up in pcs status).
>>> 
>>> Perhaps the “somehow” is creating PCMK_OCF_DEPRECATED (with the same
>>> semantics 

[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-21 Thread Jan Pokorný
Hello,

https://github.com/ClusterLabs/resource-agents/pull/846 seems to be
a first crack on integrating systemd to otherwise init-system-unaware
resource-agents.

As pacemaker already handles native systemd integration, I wonder if
it wouldn't be better to just allow, on top of that, perhaps as
special "systemd+hooks" class of resources that would also accept
"hooks" (meta) attribute pointing to an executable implementing
formalized API akin to OCF (say on-start, on-stop, meta-data
actions) that would take care of initial reflecting on the rest of
the parameters + possibly a cleanup later on.

Technically, something akin to injecting Environment, ExecStartPre
and ExecStopPost to the service definition might also achieve the
same goal if there's a transparent way to do it from pacemaker using
just systemd API (I don't know).

Indeed, the scenario I have in mind would make do with separate
"prepare grounds" agent, suitably grouped with such systemd-class
resource, but that seems more fragile configuration-wise (this
is not the granularity cluster administrator would be supposed
to be thinking in, IMHO, just as with ocf class).

Just thinking aloud before the can is open.

-- 
Jan (Poki)


pgpe9b5hU66Ge.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Resurrecting OCF

2016-09-21 Thread Jan Pokorný
On 21/09/16 14:50 +1000, Andrew Beekhof wrote:
> I like where this is going.
> Although I don’t think we want to get into the business of trying to
> script config changes from one agent to another, so I’d drop #4

Not agent parameter changes, just its specification -- to reflect
formally what the proposed symlink-based delegation scheme does when
the old one is still in use.  If the old and new are incompatible,
such automatic delegation is not possible anyway (that's one of
the reasons "description" would come handy).

I see there's much bigger potential (parameter renames, ...) but for
that, each agent should be responsible on its own (somehow, subject
of further evolution).

Also, supposing there are more consumers of RA, the suggestion to
run the script should be more generic ("when used from under
pacemaker, ...").

> I would make .deprecated a nested directory so that if we want to
> retire (for example) a ClusterLabs agent in the future we can create
> .deprecate/clusterlabs/ and put the agent there. Rather than make
> this heartbeat specific.

Good point; it would also prevent clashes when single directory should
serve all the providers.

> I wonder if some of this should live in pacemaker itself though…

This runs directly to the other side of the RA-pacemaker bias,
pacemaker caring about RA evolutionary internals :-)

In the outlook, that would make any separated OCF standard efforts
worthless and we could just call it pacemaker resource standard
right away and forget about any sort of self-containment
(the proposed procedure aims to align with).

I am not sure that would be the best thing.

> If resources_action_create() cannot find ocf:${provider}:${agent} in
> its usual location, look up
> ${OCF_ROOT_DIR}/.compat/${provider}/__entries__
> 
> Format for __entries__:
># old, replacement
># ${agent} , ${new_provider}:${new_agent} , ${description}
>IPaddr , clusterlabs:IP , Replaced with different semantics
>IPaddr2 , clusterlabs:IP , Moved
>drbd , linbit:drbd , Moved
>eDirectory , , Deleted

Additional "what happened" field might work well in the update
suggestions.

> Assuming an entry is found:
> - If  . compat/${old_provider}/${old_agent} exists, notify the user
>“somehow”, then call it.
> - Otherwise, return OCF_ERR_NOT_INSTALLED and use ${description} and
>   ${replacement} as the exit reason (which shows up in pcs status).
> 
> Perhaps the “somehow” is creating PCMK_OCF_DEPRECATED (with the same
> semantics as PCMK_OCF_DEGRADED) and prepending ${description} to the
> output (assuming its not a metadata op) and/or the exit reason[1].
> Maybe only on successful start operations to minimise the noise?
> 
> [1] Shouldn’t be too hard with some extra fields for 'struct
> svc_action_private_s’ or svc_action_t
> 
> 
>> On 19 Aug 2016, at 6:59 PM, Jan Pokorný  wrote:
>> 
>> On 18/08/16 17:27 +0200, Klaus Wenninger wrote:
>>> On 08/18/2016 05:16 PM, Ken Gaillot wrote:
>>>> On 08/18/2016 08:31 AM, Kristoffer Grönlund wrote:
>>>>> Jan Pokorný  writes:
>>>>> 
>>>>>> Thinking about that, ClusterLabs may be considered a brand established
>>>>>> well enough for "clusterlabs" provider to work better than anything
>>>>>> general such as previously proposed "core".  Also, it's not expected
>>>>>> there will be more RA-centered projects under this umbrella than
>>>>>> resource-agents (pacemaker deserves to be a provider on its own),
>>>>>> so it would be pretty unambiguous pointer.
>>>>> I like this suggestion as well.
>>>> Sounds good to me.
>>>> 
>>>>>> And for new, not well-tested agents within resource-agents, there could
>>>>>> also be a provider schema akin to "clusterlabs-staging" introduced.
>>>>>> 
>>>>>> 1 CZK
>>>>> ...and this too.
>>>> I'd rather not see this. If the RA gets promoted to "well-tested",
>>>> everyone's configuration has to change. And there's never a clear line
>>>> between "not well-tested" and "well-tested", so things wind up staying
>>>> in "beta" status long after they're widely used in production, which
>>>> unnecessarily makes people question their reliability.
>>>> 
>>>> If an RA is considered experimental, say so in the documentation
>>>> (including the man page and help text), and give it an "0.x" version 
>>>> number.
>>>> 
>>>>> Here is another one: While we 

Re: [ClusterLabs Developers] Resurrecting OCF

2016-09-05 Thread Jan Pokorný
On 18/08/16 15:31 +0200, Kristoffer Grönlund wrote:
> A pet peeve of mine would also be to move heartbeat/IPaddr2 to
> clusterlabs/IP, to finally get rid of that weird 2 in the name...

Just recalled I used to be uncomfortable with "apache" (also present
in rgmanager's breed of RAs) as it's no longer unambiguous due to
handful of other "Apache X" projects (and Apache is rather the name
of the parental foundation, anyway).

And as I've just discovered, httpd is unfortunately not unambigous
either -- there's at least (currently disjoint) OpenBSD variant.

(sigh)

-- 
Jan (Poki)


pgpouTlZnj7Jq.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Potential logo for Cluster Labs

2016-08-25 Thread Jan Pokorný
On 25/08/16 09:17 -0500, Ken Gaillot wrote:
> On 08/25/2016 09:02 AM, Kristoffer Grönlund wrote:
>> Klaus Wenninger  writes:
>> 
>>> On 08/25/2016 03:13 PM, Andrew Price wrote:
 On 25/08/16 13:58, Klaus Wenninger wrote:
> On 08/25/2016 12:49 PM, Andrew Price wrote:
>> On 24/08/16 18:50, Ken Gaillot wrote:
>>> Suggestions/revisions/alternatives are welcome.
>> 
>> Here's a possible alternative theme. It's similarly greyscale and I'm
>> not hugely happy with the font (I don't seem to have many good ones
>> installed) but I'm happy enough with it to throw it on the pile :)
>> 
>> Alright, if we're throwing logo design ideas on a pile, here's mine!
>> 
>> The idea being basically a beaker with servers in it, hence.. Clusterlabs.
> 
> Bwahaha ... I love it.

Yeah, imagine an animated, blinking version :)

-- 
Jan (Poki)


pgpkwELu4xYZb.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Potential logo for Cluster Labs

2016-08-24 Thread Jan Pokorný
On 24/08/16 12:50 -0500, Ken Gaillot wrote:
> I was doodling the other day and came up with a potential logo for
> Cluster Labs. I've attached an example of what I came up with. It's
> meant to subtly represent an outer "C" of resources around an inner "L"
> of nodes.
> 
> We have a Pacemaker logo used on the website already, but I thought it
> might be nice to have a Cluster Labs logo for the website and
> documentation, that could tie all the various projects together.
> 
> Comments anyone? The example here is greyscale for discussion purposes,
> but the final should have some color scheme.
> Suggestions/revisions/alternatives are welcome.

Not a bad idea to start with.

I just hope the graphic source boils down to vectors (preferably SVG).
We are not binary patchers, afterall ;-)

-- 
Jan (Poki)


pgp8GKjnpS4Ab.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Resurrecting OCF

2016-08-19 Thread Jan Pokorný
On 19/08/16 13:12 +0200, Jan Pokorný wrote:
> On 19/08/16 11:14 +0200, Jan Pokorný wrote:
>> On 19/08/16 10:59 +0200, Jan Pokorný wrote:
>>> So, having some more thoughts on this, here's the possible action
>>> plan (just for heartbeat -> clusterlabs transition + deprecating
>>> some agents, but clusterlabs-staging -> clusterlabs would be similar):
>>> 
>>> # (adapt and) move original heartbeat agents
>>> 
>>> 1. have a resource.d subdirectory "clusterlabs" and move (possibly under
>>>new names) agents that were a priori updated to reflect new revision
>>>of OCF there
>>> 
>>> 2. have a resource.d subdirectory ".deprecated" (for instance) and
>>>move the RAs that are going to be sunset over there (i.e.,
>>>original heartbeat agents = agents moved to clusterlabs + agents
>>>moved to .deprecated + agents that remained under heartbeat, pending
>>>to be moved under clusterlabs)
>>> 
>>> # preparation for backward compatibility
>>> 
>>> 3. have a file with old heartbeat name -> new clusterlabs name mapping
>>>for the agents from 0., i.e., hence physically changed the directory;
>>>the format can be as simple as CVS with "old name; [new name]" lines
>>>where omitted new name means that actual name hasn't changed
>>>(unlike proposed IPaddress2 -> IP)
>>> 
>>> 4. have an XSL template that will convert resource references per the
>>>translation file from 3. (this XSLT should be automatically
>>>generated based on that file) and a script that will call
>>>something like:
>>>cibadmin -Q | xsltproc  - | cibadmin --replace --xml-pipe
>>> 
>>> 5. have a shell script "__cl_compat__" (for instance, name clearly
>>>distinguishable will become handy later on), that will:
>>>- figure which symlink it was called under ("$0") and figure out
>>>  how it should behave based on file from 3.:
>>>  . $0 found as old name with new name -> clusterlabs/
>>>will be called
>>>  . $0 found as old name without new name -> clusterlabs/
>>>will be called
>>>  . $0 not found as old name -> .deprecated/ will be
>>>called if exists (otherwise fail early)
>>>- if "$HA_RSCTMP/$(basename $0)_compat" exists, just run:
>>>  $0 "$@"; exit $?
>>>  the purpose here is to avoid excessive spamming in the logs
>>>- touch "$HA_RSCTMP/$(basename $0)_compat"
>>>- emit a warning "Your configuration referes to the agent with
>>>  an obsolete specification", followed with corresponding:
>>>   . "please consider changing ocf:heartbeat: to
>>>  ocf:clusterlabs:, you may use 

Re: [ClusterLabs Developers] Resurrecting OCF

2016-08-19 Thread Jan Pokorný
On 19/08/16 11:14 +0200, Jan Pokorný wrote:
> On 19/08/16 10:59 +0200, Jan Pokorný wrote:
>> So, having some more thoughts on this, here's the possible action
>> plan (just for heartbeat -> clusterlabs transition + deprecating
>> some agents, but clusterlabs-staging -> clusterlabs would be similar):
>> 
>> # (adapt and) move original heartbeat agents
>> 
>> 1. have a resource.d subdirectory "clusterlabs" and move (possibly under
>>new names) agents that were a priori updated to reflect new revision
>>of OCF there
>> 
>> 2. have a resource.d subdirectory ".deprecated" (for instance) and
>>move the RAs that are going to be sunset over there (i.e.,
>>original heartbeat agents = agents moved to clusterlabs + agents
>>moved to .deprecated + agents that remained under heartbeat, pending
>>to be moved under clusterlabs)
>> 
>> # preparation for backward compatibility
>> 
>> 3. have a file with old heartbeat name -> new clusterlabs name mapping
>>for the agents from 0., i.e., hence physically changed the directory;
>>the format can be as simple as CVS with "old name; [new name]" lines
>>where omitted new name means that actual name hasn't changed
>>(unlike proposed IPaddress2 -> IP)
>> 
>> 4. have an XSL template that will convert resource references per the
>>translation file from 3. (this XSLT should be automatically
>>generated based on that file) and a script that will call
>>something like:
>>cibadmin -Q | xsltproc  - | cibadmin --replace --xml-pipe
>> 
>> 5. have a shell script "__cl_compat__" (for instance, name clearly
>>distinguishable will become handy later on), that will:
>>- figure which symlink it was called under ("$0") and figure out
>>  how it should behave based on file from 3.:
>>  . $0 found as old name with new name -> clusterlabs/
>>will be called
>>  . $0 found as old name without new name -> clusterlabs/
>>will be called
>>  . $0 not found as old name -> .deprecated/ will be
>>called if exists (otherwise fail early)
>>- if "$HA_RSCTMP/$(basename $0)_compat" exists, just run:
>>  $0 "$@"; exit $?
>>  the purpose here is to avoid excessive spamming in the logs
>>- touch "$HA_RSCTMP/$(basename $0)_compat"
>>- emit a warning "Your configuration referes to the agent with
>>  an obsolete specification", followed with corresponding:
>>   . "please consider changing ocf:heartbeat: to
>>  ocf:clusterlabs:, you may use 

Re: [ClusterLabs Developers] Resurrecting OCF

2016-08-19 Thread Jan Pokorný
On 19/08/16 10:59 +0200, Jan Pokorný wrote:
> So, having some more thoughts on this, here's the possible action
> plan (just for heartbeat -> clusterlabs transition + deprecating
> some agents, but clusterlabs-staging -> clusterlabs would be similar):
> 
> # (adapt and) move original heartbeat agents
> 
> 1. have a resource.d subdirectory "clusterlabs" and move (possibly under
>new names) agents that were a priori updated to reflect new revision
>of OCF there
> 
> 2. have a resource.d subdirectory ".deprecated" (for instance) and
>move the RAs that are going to be sunset over there (i.e.,
>original heartbeat agents = agents moved to clusterlabs + agents
>moved to .deprecated + agents that remained under heartbeat, pending
>to be moved under clusterlabs)
> 
> # preparation for backward compatibility
> 
> 3. have a file with old heartbeat name -> new clusterlabs name mapping
>for the agents from 0., i.e., hence physically changed the directory;
>the format can be as simple as CVS with "old name; [new name]" lines
>where omitted new name means that actual name hasn't changed
>(unlike proposed IPaddress2 -> IP)
> 
> 4. have an XSL template that will convert resource references per the
>translation file from 3. (this XSLT should be automatically
>generated based on that file) and a script that will call
>something like:
>cibadmin -Q | xsltproc  - | cibadmin --replace --xml-pipe
> 
> 5. have a shell script "__cl_compat__" (for instance, name clearly
>distinguishable will become handy later on), that will:
>- figure which symlink it was called under ("$0") and figure out
>  how it should behave based on file from 3.:
>  . $0 found as old name with new name -> clusterlabs/
>will be called
>  . $0 found as old name without new name -> clusterlabs/
>will be called
>  . $0 not found as old name -> .deprecated/ will be
>called if exists (otherwise fail early)
>- if "$HA_RSCTMP/$(basename $0)_compat" exists, just run:
>  $0 "$@"; exit $?
>  the purpose here is to avoid excessive spamming in the logs
>- touch "$HA_RSCTMP/$(basename $0)_compat"
>- emit a warning "Your configuration referes to the agent with
>  an obsolete specification", followed with corresponding:
>   . "please consider changing ocf:heartbeat: to
>  ocf:clusterlabs:, you may use 

Re: [ClusterLabs Developers] Resurrecting OCF

2016-08-19 Thread Jan Pokorný
On 18/08/16 17:27 +0200, Klaus Wenninger wrote:
> On 08/18/2016 05:16 PM, Ken Gaillot wrote:
>> On 08/18/2016 08:31 AM, Kristoffer Grönlund wrote:
>>> Jan Pokorný  writes:
>>> 
>>>> Thinking about that, ClusterLabs may be considered a brand established
>>>> well enough for "clusterlabs" provider to work better than anything
>>>> general such as previously proposed "core".  Also, it's not expected
>>>> there will be more RA-centered projects under this umbrella than
>>>> resource-agents (pacemaker deserves to be a provider on its own),
>>>> so it would be pretty unambiguous pointer.
>>> I like this suggestion as well.
>> Sounds good to me.
>> 
>>>> And for new, not well-tested agents within resource-agents, there could
>>>> also be a provider schema akin to "clusterlabs-staging" introduced.
>>>> 
>>>> 1 CZK
>>> ...and this too.
>> I'd rather not see this. If the RA gets promoted to "well-tested",
>> everyone's configuration has to change. And there's never a clear line
>> between "not well-tested" and "well-tested", so things wind up staying
>> in "beta" status long after they're widely used in production, which
>> unnecessarily makes people question their reliability.
>> 
>> If an RA is considered experimental, say so in the documentation
>> (including the man page and help text), and give it an "0.x" version number.
>> 
>>> Here is another one: While we are moving agents into a new namespace,
>>> perhaps it is time to clean up some of the legacy agents that are no
>>> longer recommended or of questionable quality? Off the top of my head,
>>> there are
>>> 
>>> * heartbeat/Evmsd
>>> * heartbeat/EvmsSCC
>>> * heartbeat/LinuxSCSI
>>> * heartbeat/pingd
>>> * heartbeat/IPaddr
>>> * heartbeat/ManageRAID
>>> * heartbeat/vmware
>>> 
>>> A pet peeve of mine would also be to move heartbeat/IPaddr2 to
>>> clusterlabs/IP, to finally get rid of that weird 2 in the name...
>> +1!!! (or is it -2?)
>> 
>>> Cheers,
>>> Kristoffer
>> Obviously, we need to keep the ocf:heartbeat provider around for
>> backward compatibility, for the extensive existing uses both in cluster
>> configurations and in the zillions of how-to's scattered around the web.
>> 
>> Also, despite the recommendation of creating your own provider, many
>> people drop custom RAs in the heartbeat directory.
>> 
>> The simplest approach would be to just symlink heartbeat to clusterlabs,
>> but I think that's a bad idea. If a custom RA deployment or some package
>> other than resource-agents puts an RA there, resource-agents will try to
>> make it a symlink and the other package will try to make it a directory.
>> Plus, people may have configuration management systems and/or file
>> integrity systems that need it to be a directory.
>> 
>> So, I'd recommend we keep the heartbeat directory, and keep the old RAs
>> you list above in it, move the rest of the RAs to the new clusterlabs
>> directory, and symlink each one back to the heartbeat directory. At the
>> same time, we can announce the heartbeat provider as deprecated, and
>> after a very long time (when it's difficult to find references to it via
>> google), we can drop it.
> 
> Maybe a way to go for the staging-RAs as well:
> Have them in clusterlabs-staging and symlinked (during install
> or package-generation) into clusterlabs ... while they are
> cleanly separated in the source-tree.

So, having some more thoughts on this, here's the possible action
plan (just for heartbeat -> clusterlabs transition + deprecating
some agents, but clusterlabs-staging -> clusterlabs would be similar):

# (adapt and) move original heartbeat agents

1. have a resource.d subdirectory "clusterlabs" and move (possibly under
   new names) agents that were a priori updated to reflect new revision
   of OCF there

2. have a resource.d subdirectory ".deprecated" (for instance) and
   move the RAs that are going to be sunset over there (i.e.,
   original heartbeat agents = agents moved to clusterlabs + agents
   moved to .deprecated + agents that remained under heartbeat, pending
   to be moved under clusterlabs)

# preparation for backward compatibility

3. have a file with old heartbeat name -> new clusterlabs name mapping
   for the agents from 0., i.e., hence physically changed the directory;
   the format can be as simple as CVS with &

Re: [ClusterLabs Developers] Resurrecting OCF

2016-08-18 Thread Jan Pokorný
On 15/08/16 12:37 +0200, Jan Pokorný wrote:
> On 18/07/16 11:13 -0500, Ken Gaillot wrote:
>> A suggestion came up recently to formalize a new version of the OCF
>> resource agent API standard[1].
>> 
>> The main goal would be to formalize the API as it is actually used
>> today, and to replace the "unique" meta-data attribute with two new
>> attributes indicating uniqueness and reloadability.
> 
> My suggestion would be to consider changing the provider name for RAs
> from resource-agents upstream project to anything more reasonable
> than "heartbeat"

Thinking about that, ClusterLabs may be considered a brand established
well enough for "clusterlabs" provider to work better than anything
general such as previously proposed "core".  Also, it's not expected
there will be more RA-centered projects under this umbrella than
resource-agents (pacemaker deserves to be a provider on its own),
so it would be pretty unambiguous pointer.

And for new, not well-tested agents within resource-agents, there could
also be a provider schema akin to "clusterlabs-staging" introduced.

1 CZK

> in one step with bumping to-be-added conformance parameter in
> meta-data denoting that the RA in question reflects the requirements
> of the new revision of OCF/resource agents API (and, apparently, in
> one step with delivering any conformance adjustments needed, such as
> mentioned "unique" indicator).
> 
> Original thread regarding this related suggestion from 3 years ago:
> http://lists.linux-ha.org/pipermail/linux-ha/2013-July/047320.html
> 
> spanned also into the following month:
> http://lists.linux-ha.org/pipermail/linux-ha/2013-August/047368.html
> 
>> We could also add the fence agent API as a new spec, or expand the
>> RA spec to cover both.
> 
> Definitely, the spec(s) should be as language-agnostic as possible
> (so no pretending that, e.g., fencing library of fence-agents is
> a panacea to hide all the interaction/inteface details; the goal
> of the standardization work should be to allow truly interchangeable
> components).
> 
>> [...]
>> 
>> [1]
>> http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD

-- 
Jan (Poki)


pgp1m8vxTmBLf.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


  1   2   >