Yep, we've definitely got /dev/shm (this was done to fix an earlier problem).
----------------
John White
HPC Systems Engineer
(510) 486-7307
One Cyclotron Rd, MS: 50C-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720

On Mar 27, 2013, at 4:46 PM, Andrew Beekhof <and...@beekhof.net> wrote:

> What about /dev/shm ?
> Libqb tries to create some shared memory in that location by default.
> 
> On Thu, Mar 28, 2013 at 8:50 AM, John White <jwh...@lbl.gov> wrote:
>> Yup:
>> -bash-4.1$ cd /var/run/crm/
>> -bash-4.1$ ls
>> lost+found  pcmk  pengine  st_callback  st_command
>> -bash-4.1$ touch blah
>> -bash-4.1$ ls -l
>> total 16
>> -rw-r--r-- 1 hacluster haclient     0 Mar 27 14:50 blah
>> drwx------ 2 root      root     16384 Mar 14 15:00 lost+found
>> srwxrwxrwx 1 root      root         0 Mar 22 11:25 pcmk
>> srwxrwxrwx 1 hacluster root         0 Mar 22 11:25 pengine
>> srwxrwxrwx 1 root      root         0 Mar 22 11:25 st_callback
>> srwxrwxrwx 1 root      root         0 Mar 22 11:25 st_command
>> -bash-4.1$ ls -l /var/run/| grep crm
>> drwxr-xr-x 3 hacluster haclient 4096 Mar 27 14:50 crm
>> -bash-4.1$ whoami
>> hacluster
>> -bash-4.1$
>> ----------------
>> John White
>> HPC Systems Engineer
>> (510) 486-7307
>> One Cyclotron Rd, MS: 50C-3209C
>> Lawrence Berkeley National Lab
>> Berkeley, CA 94720
>> 
>> On Mar 25, 2013, at 4:21 PM, Andreas Kurz <andr...@hastexo.com> wrote:
>> 
>>> On 2013-03-22 19:31, John White wrote:
>>>> Hello Folks,
>>>>     We're trying to get a corosync/pacemaker instance going on a 4 node 
>>>> cluster that boots via pxe.  There have been a number of state/file system 
>>>> issues, but those appear to be *mostly* taken care of thus far.  We're 
>>>> running into an issue now where cib just isn't staying up with errors akin 
>>>> to the following (sorry for the lengthy dump, note the attrd and cib 
>>>> connection errors).  Any ideas would be greatly appreciated:
>>>> 
>>>> Mar 22 11:25:18 n0014 cib: [25839]: info: validate_with_relaxng: Creating 
>>>> RNG parser context
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: info: Invoked: 
>>>> /usr/lib64/heartbeat/attrd
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: info: crm_log_init_worker: Changed 
>>>> active directory to /var/lib/heartbeat/cores/hacluster
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Starting up
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: info: get_cluster_type: Cluster type 
>>>> is: 'corosync'
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: notice: crm_cluster_connect: 
>>>> Connecting to cluster infrastructure: corosync
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: init_cpg_connection: Could 
>>>> not connect to the Cluster Process Group API: 2
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: HA Signon failed
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Cluster connection active
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: info: main: Accepting attribute 
>>>> updates
>>>> Mar 22 11:25:18 n0014 attrd: [25841]: ERROR: main: Aborting startup
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: info: Invoked: 
>>>> /usr/lib64/heartbeat/pengine
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: info: crm_log_init_worker: Changed 
>>>> active directory to /var/lib/heartbeat/cores/hacluster
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Checking for old 
>>>> instances of pengine
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: 
>>>> init_client_ipc_comms_nodispatch: Attempting to talk on: 
>>>> /var/run/crm/pengine
>>> 
>>> That "/var/run/crm" directory is available and owned by
>>> hacluster.haclient ... and writable by at least the hacluster user?
>>> 
>>> Regards,
>>> Andreas
>>> 
>>> --
>>> Need help with Pacemaker?
>>> http://www.hastexo.com/now
>>> 
>>>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: ERROR: pcmk_child_exit: Child 
>>>> process attrd exited (pid=25841, rc=100)
>>>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: notice: pcmk_child_exit: Child 
>>>> process attrd no longer wishes to be respawned
>>>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: info: update_node_processes: 
>>>> Node n0014.lustre now has process list: 00000000000000000000000000110312 
>>>> (was 00000000000000000000000000111312)
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: 
>>>> init_client_ipc_comms_nodispatch: Could not init comms on: 
>>>> /var/run/crm/pengine
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: debug: main: Init server comms
>>>> Mar 22 11:25:18 n0014 pengine: [25842]: info: main: Starting pengine
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: init_cpg_connection: 
>>>> Adding fd=4 to mainloop
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: init_ais_connection_once: 
>>>> Connection to 'corosync': established
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: crm_new_peer: Creating 
>>>> entry for node n0014.lustre/247988234
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node 
>>>> n0014.lustre now has id: 247988234
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_new_peer: Node 
>>>> 247988234 is now known as n0014.lustre
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: debug: 
>>>> init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/pcmk
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: info: Invoked: 
>>>> /usr/lib64/heartbeat/crmd
>>>> Mar 22 11:25:18 n0014 pacemakerd: [25834]: debug: pcmk_client_connect: 
>>>> Channel 0x995530 connected: 1 children
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: main: Starting stonith-ng 
>>>> mainloop
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: info: crm_log_init_worker: Changed 
>>>> active directory to /var/lib/heartbeat/cores/hacluster
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: info: main: CRM Hg Version: 
>>>> a02c0f19a00c1eb2527ad38f146ebc0834814558
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: info: crmd_init: Starting crmd
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: s_crmd_fsa: Processing 
>>>> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: 
>>>> #011// A_LOG
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: 
>>>> #011// A_STARTUP
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Registering Signal 
>>>> Handlers
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_startup: Creating CIB and 
>>>> LRM objects
>>>> Mar 22 11:25:18 n0014 stonith-ng: [25838]: info: crm_update_peer: Node 
>>>> n0014.lustre: id=247988234 state=unknown addr=(null) votes=0 born=0 seen=0 
>>>> proc=00000000000000000000000000110312 (new)
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: info: G_main_add_SignalHandler: Added 
>>>> signal handler for signal 17
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: do_fsa_action: actions:trace: 
>>>> #011// A_CIB_START
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>>>> init_client_ipc_comms_nodispatch: Attempting to talk on: 
>>>> /var/run/crm/cib_rw
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>>>> init_client_ipc_comms_nodispatch: Could not init comms on: 
>>>> /var/run/crm/cib_rw
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: 
>>>> Connection to command channel failed
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>>>> init_client_ipc_comms_nodispatch: Attempting to talk on: 
>>>> /var/run/crm/cib_callback
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: 
>>>> init_client_ipc_comms_nodispatch: Could not init comms on: 
>>>> /var/run/crm/cib_callback
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: 
>>>> Connection to callback channel failed
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signon_raw: 
>>>> Connection to CIB failed: connection failed
>>>> Mar 22 11:25:18 n0014 crmd: [25843]: debug: cib_native_signoff: Signing 
>>>> out of the CIB Service
>>>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: Element cib failed to validate 
>>>> content
>>>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: readCibXmlFile: CIB does not 
>>>> validate with <null>
>>>> Mar 22 11:25:18 n0014 cib: [25839]: info: startCib: CIB Initialization 
>>>> completed successfully
>>>> Mar 22 11:25:18 n0014 cib: [25839]: info: get_cluster_type: Cluster type 
>>>> is: 'corosync'
>>>> Mar 22 11:25:18 n0014 cib: [25839]: notice: crm_cluster_connect: 
>>>> Connecting to cluster infrastructure: corosync
>>>> Mar 22 11:25:18 n0014 cib: [25839]: ERROR: init_cpg_connection: Could not 
>>>> connect to the Cluster Process Group API: 2
>>>> Mar 22 11:25:18 n0014 cib: [25839]: CRIT: cib_init: Cannot sign in to the 
>>>> cluster... terminating
>>>> 
>>>> 
>>>> ----------------
>>>> John White
>>>> HPC Systems Engineer
>>>> (510) 486-7307
>>>> One Cyclotron Rd, MS: 50C-3209C
>>>> Lawrence Berkeley National Lab
>>>> Berkeley, CA 94720
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to