Re: [Linux-cluster] Ricci doesn't work

2012-09-12 Thread Jan Pokorný
On 12/09/12 14:00 +0200, Jan Pokorný wrote:
> cat 

Re: [Linux-cluster] Ricci doesn't work

2012-09-12 Thread Jan Pokorný
On 10/09/12 21:54 +, Chip Burke wrote:
> Well, after a few days of testing it seems luci/ricci is still flakey.

At first, there seems to be more than "XML entities" problem involved.
This is because luci, unlike ccs and ccs_sync ("cman_tool version" uses
ccs_sync under the hood) should not suffer from that one.
So probably yet another separate issue...

> If I update things via luci, the cluster.conf on the local machine with
> luci updates, but nothing pushes out to other nodes. If I then manually run
> ccs_sync on the node with the new configuration, things push out to the
> other nodes fine. While this is wonky, at least it is consistent and
> repeatable so I can do things in this manner until fixes are in. Though,
> certainly let me know if you want any more logs or whatever from me.

To get a better error message in luci.log, something to start with, could
you apply a workaround for that "no translator" issue, please?

The recipe, based on the real patch to fix the mentioned bug, is as
follows (as root on the host with luci installed):

---

pushd $(rpm --eval %python_sitearch)
cat 

Re: [Linux-cluster] Ricci doesn't work

2012-09-10 Thread Chip Burke
Well, after a few days of testing it seems luci/ricci is still flakey. If
I update things via luci, the cluster.conf on the local machine with luci
updates, but nothing pushes out to other nodes. If I then manually run
ccs_sync on the node with the new configuration, things push out to the
other nodes fine. While this is wonky, at least it is consistent and
repeatable so I can do things in this manner until fixes are in. Though,
certainly let me know if you want any more logs or whatever from me.

Chip Burke





--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-09-07 Thread Jan Pokorný
On 06/09/12 18:44 +, Chip Burke wrote:
> I only eliminated the second & and that was all it took. My guess is this
> is some edge case where the two &s in the string made it break where as a
> single & did not cause things to escape or what have you.

Oh, I now see what is going on.  Once you were successful with alpha-only
password, the authentication is now based merely on the certificates (no
longer a password is involved, no matter if it gets changed or not).
Hence, the XML-unsafe character in the password has no chance to puzzle
ricci as in the discussed case where the initial password
authentication was yet to be done.

Anyway, the mentioned bugs are here to allow using even these
problematic characters safely.

> All that said, thanks again!

You're welcome.

-- 
Jan

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-09-06 Thread Chip Burke
I only eliminated the second & and that was all it took. My guess is this
is some edge case where the two &s in the string made it break where as a
single & did not cause things to escape or what have you.

All that said, thanks again!

Chip Burke







On 9/6/12 2:32 PM, "Jan Pokorný"  wrote:

>On 06/09/12 15:11 +, Chip Burke wrote:
>> Well that was an easy enough fix finally. I thought perhaps the password
>> for the VMWare fence account was the issue and updated cluster.conf
>>with a
>> place holder password of 'password'. Ricci would not work. So I updated
>> the actual ricci user account to use a password of 'password' and
>> restarted Ricci on all of the nodes. Ricci now works. So indeed, it
>> certainly did not like a character in the password I was using which was
>> 65peC&E$taFRE&U. In all likelihood the & was the problem character.
>
>Bull's eye.
>
>> On 9/6/12 6:11 AM, "Jan Pokorný"  wrote:
>>> The easiest explanation is that this XML is not well-formed, which
>>> would boil down to your obfuscated password (not offending it,
>>> it's highly reasonable).  Did you password contain any XML-nonfriendly
>>> character, such as one of '<>"&'?  If so, could you please try digits,
>>> ASCII letters and surely-safe characters only (dot, dash, etc.)?
>
>Admittedly, this obstacle should be easier to track down, if allowed
>to exist at all (see bellow).
>
>> To confirm that hypothesis, I changed the Ricci password to
>>65peC&E$taFREU
>> and everything still worked as expected.
>
>Once at it, it should have been "65peCE$taFREU" (no & char), shouldn't
>it?
>
>> From your stand point I don't know if that needs to be coded around or
>>what,
>> but at least we know how to reproduce the issue.
>
>Thanks for bringing up this part we should be more careful about.
>As a starter, I filed these bugs:
>
>- ricci (needs to understand the XML entities properly)
>  https://bugzilla.redhat.com/show_bug.cgi?id=855121
>
>(clients need to do a proper encoding into XML entities)
>- luci: https://bugzilla.redhat.com/show_bug.cgi?id=855112
>- ccs:  https://bugzilla.redhat.com/show_bug.cgi?id=855117
>- ccs_sync: https://bugzilla.redhat.com/show_bug.cgi?id=855120
>
>Also based on studying some relevant parts of the ricci's code,
>I've added a few private suggestions under the umbrella of bug 849233.
>
>> Thanks again for sticking with me on this even if the cause was somewhat
>> silly.
>
>To be fair enough, no matter how unprobable the reason of not working
>correctly is (let's keep complex configuration errors aside), one
>can expect such things self-evident (via the messages, logs, etc.),
>not as an exercise left to the user and indirectly back to the
>maintainer :-)
>
>Thanks,
>Jan
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-09-06 Thread Jan Pokorný
On 06/09/12 15:11 +, Chip Burke wrote:
> Well that was an easy enough fix finally. I thought perhaps the password
> for the VMWare fence account was the issue and updated cluster.conf with a
> place holder password of 'password'. Ricci would not work. So I updated
> the actual ricci user account to use a password of 'password' and
> restarted Ricci on all of the nodes. Ricci now works. So indeed, it
> certainly did not like a character in the password I was using which was
> 65peC&E$taFRE&U. In all likelihood the & was the problem character.

Bull's eye.

> On 9/6/12 6:11 AM, "Jan Pokorný"  wrote:
>> The easiest explanation is that this XML is not well-formed, which
>> would boil down to your obfuscated password (not offending it,
>> it's highly reasonable).  Did you password contain any XML-nonfriendly
>> character, such as one of '<>"&'?  If so, could you please try digits,
>> ASCII letters and surely-safe characters only (dot, dash, etc.)?

Admittedly, this obstacle should be easier to track down, if allowed
to exist at all (see bellow).

> To confirm that hypothesis, I changed the Ricci password to 65peC&E$taFREU
> and everything still worked as expected.

Once at it, it should have been "65peCE$taFREU" (no & char), shouldn't
it?

> From your stand point I don't know if that needs to be coded around or what,
> but at least we know how to reproduce the issue.

Thanks for bringing up this part we should be more careful about.
As a starter, I filed these bugs:

- ricci (needs to understand the XML entities properly)
  https://bugzilla.redhat.com/show_bug.cgi?id=855121

(clients need to do a proper encoding into XML entities)
- luci: https://bugzilla.redhat.com/show_bug.cgi?id=855112
- ccs:  https://bugzilla.redhat.com/show_bug.cgi?id=855117
- ccs_sync: https://bugzilla.redhat.com/show_bug.cgi?id=855120

Also based on studying some relevant parts of the ricci's code,
I've added a few private suggestions under the umbrella of bug 849233.

> Thanks again for sticking with me on this even if the cause was somewhat
> silly.

To be fair enough, no matter how unprobable the reason of not working
correctly is (let's keep complex configuration errors aside), one
can expect such things self-evident (via the messages, logs, etc.),
not as an exercise left to the user and indirectly back to the
maintainer :-)

Thanks,
Jan

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Ricci doesn't work

2012-09-06 Thread Chip Burke
Well that was an easy enough fix finally. I thought perhaps the password
for the VMWare fence account was the issue and updated cluster.conf with a
place holder password of 'password'. Ricci would not work. So I updated
the actual ricci user account to use a password of 'password' and
restarted Ricci on all of the nodes. Ricci now works. So indeed, it
certainly did not like a character in the password I was using which was
65peC&E$taFRE&U. In all likelihood the & was the problem character. To
confirm that hypothesis, I changed the Ricci password to 65peC&E$taFREU
and everything still worked as expected. So there is our answer. From your
stand point I don't know if that needs to be coded around or what, but at
least we know how to reproduce the issue.

Thanks again for sticking with me on this even if the cause was somewhat
silly.



Chip Burke







On 9/6/12 6:11 AM, "Jan Pokorný"  wrote:

>On 05/09/12 16:21 +, Chip Burke wrote:
>> This gives me the same behavior.
>
>Sorry, I can now see it was a bad workaround guess from the beginning.
>
>I think the strace logs you provided contain good-enough information
>about the issue and still scratching my head.
>
>Part of it is that there are two sub-issues and I am not sure if
>they are isolated or there is a causality relationship.
>
>
>The first one is hidden and very probably innocent -- the one present
>in ricci.strace.5573 -- EPIPE/SIGPIPE.  First, there is an extra empty
>read because (I think) the client of ricci has shutdown the connection
>first (seems like ungraceful way of disconnection, but still tolerable),
>but ricci side, despite this fact, is trying to send closure notify
>message so as to achieve expected graceful disconnection.
>Apparently, this fails in this case, accompanied by EPIPE/SIGPIPE.
>
>
>However, the second is one -- unability to proceed the request, failing
>upon timeout as can be seen in ricci.strace.5575 -- is severe.
>> read(5, "\27\3\1\0p", 5)= 5
>> read(5, 
>>"\373\303\16\20>\202%\34\211\214b\\l\260\354\3662\312\272\21<\t\r\235S\31
>>o\361\21\265\266p"..., 112) = 112
>Here the two trailing bytes out of first five (0x00 0x70) indicates the
>whole size of the message that is indeed read as expected (112).
>Ricci should *not* keep trying to read pass this point as the whole XML
>message should have been received at this moment.  But for some
>reason it does (see subsequent poll with POLLIN flag).
>
>The easiest explanation is that this XML is not well-formed, which
>would boil down to your obfuscated password (not offending it,
>it's highly reasonable).  Did you password contain any XML-nonfriendly
>character, such as one of '<>"&'?  If so, could you please try digits,
>ASCII letters and surely-safe characters only (dot, dash, etc.)?
>
>
>As outlined, these two issues can be even interconnected (having
>OpenSSL error queue at the main thread, which does not get cleared
>explictly as it probably should, in mind).  I am going to look more
>into it (perhaps put together simple client for you to try) after
>knowing your situation with the password.
>
>If there is nothing suspicious about your password to authenticate
>against ricci, the inverse of previously suggested workaround could
>be tried (manually pre-authenticating ccs against ricci);
>from the host ccs is being run at, something along the lines:
>
>$ ccs ...  # if ~/.ccs/cacert.pem does not exist yet
>$ RICCIHOST=machina
>$ RICCICERT=$(mktemp -u /var/lib/ricci/certs/clients/client_cert_XX)
>$ scp ~/.ccs/cacert.pem root@$RICCIHOST:$RICCICERT
>$ ssh root@$RICCIHOST chown ricci:root $RICCICERT
>$ ssh root@$RICCIHOST restorecon $RICCICERT  # if using SELinux
>$ ssh root@$RICCIHOST service ricci restart
>$ ccs ...
>
>Thanks,
>Jan
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-09-06 Thread Jan Pokorný
On 05/09/12 16:21 +, Chip Burke wrote:
> This gives me the same behavior.

Sorry, I can now see it was a bad workaround guess from the beginning.

I think the strace logs you provided contain good-enough information
about the issue and still scratching my head.

Part of it is that there are two sub-issues and I am not sure if
they are isolated or there is a causality relationship.


The first one is hidden and very probably innocent -- the one present
in ricci.strace.5573 -- EPIPE/SIGPIPE.  First, there is an extra empty
read because (I think) the client of ricci has shutdown the connection
first (seems like ungraceful way of disconnection, but still tolerable),
but ricci side, despite this fact, is trying to send closure notify
message so as to achieve expected graceful disconnection.
Apparently, this fails in this case, accompanied by EPIPE/SIGPIPE.


However, the second is one -- unability to proceed the request, failing
upon timeout as can be seen in ricci.strace.5575 -- is severe.
> read(5, "\27\3\1\0p", 5)= 5
> read(5, 
> "\373\303\16\20>\202%\34\211\214b\\l\260\354\3662\312\272\21<\t\r\235S\31o\361\21\265\266p"...,
>  112) = 112
Here the two trailing bytes out of first five (0x00 0x70) indicates the
whole size of the message that is indeed read as expected (112).
Ricci should *not* keep trying to read pass this point as the whole XML
message should have been received at this moment.  But for some
reason it does (see subsequent poll with POLLIN flag).

The easiest explanation is that this XML is not well-formed, which
would boil down to your obfuscated password (not offending it,
it's highly reasonable).  Did you password contain any XML-nonfriendly
character, such as one of '<>"&'?  If so, could you please try digits,
ASCII letters and surely-safe characters only (dot, dash, etc.)?


As outlined, these two issues can be even interconnected (having
OpenSSL error queue at the main thread, which does not get cleared
explictly as it probably should, in mind).  I am going to look more
into it (perhaps put together simple client for you to try) after
knowing your situation with the password.

If there is nothing suspicious about your password to authenticate
against ricci, the inverse of previously suggested workaround could
be tried (manually pre-authenticating ccs against ricci);
from the host ccs is being run at, something along the lines:

$ ccs ...  # if ~/.ccs/cacert.pem does not exist yet
$ RICCIHOST=machina
$ RICCICERT=$(mktemp -u /var/lib/ricci/certs/clients/client_cert_XX)
$ scp ~/.ccs/cacert.pem root@$RICCIHOST:$RICCICERT
$ ssh root@$RICCIHOST chown ricci:root $RICCICERT
$ ssh root@$RICCIHOST restorecon $RICCICERT  # if using SELinux
$ ssh root@$RICCIHOST service ricci restart
$ ccs ...

Thanks,
Jan

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-09-05 Thread Chip Burke
This gives me the same behavior. On each node I ran:


#rm -f /var/lib/ricci/certs/clients/*

#service ricci restart

Then from one node I simply incremented the version in cluster.conf and ran

# cman_tool version -r
You have not authenticated to the ricci daemon on xanadunode1
Password: 
You have not authenticated to the ricci daemon on xanadunode2
Password: 
You have not authenticated to the ricci daemon on xanadunode3
Password: 
The connection to xanadunode1 died unexpectedly
The connection to xanadunode2 died unexpectedly
The connection to xanadunode3 died unexpectedly
cman_tool: ccs_sync failed.
If you have distributed the config file yourself, try re-running with -S



Let me know if there is any other info I can give you.

Thanks again!


Chip Burke







On 9/5/12 9:34 AM, "Jan Pokorný"  wrote:

>On 20/08/12 14:51 +, Chip Burke wrote:
>> Thanks for sticking with me on this.
>
>Sorry for delay.
>
>The traceback from your initial email, which ended with:
>> TypeError: No object (name: translator) has been registered for this
>> thread
>
>I overlooked this one, but independently hit it too [1].
>
>But this is orthogonal to the main issue you are stating
>and which I am having troubles to figure out.
>
>Could you please try following as a possible workaround?
>At the particular node (or all within the cluster), try running
>rm -f /var/lib/ricci/certs/clients/* and restarting ricci.
>As a consequence, you may be prompted for the password to authenticate
>against ricci instance(s) and/or re-add the cluster in luci even if
>this was already done before.
>
>[1] https://bugzilla.redhat.com/show_bug.cgi?id=853151
>
>Thanks,
>Jan
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-09-05 Thread Jan Pokorný
On 20/08/12 14:51 +, Chip Burke wrote:
> Thanks for sticking with me on this.

Sorry for delay.

The traceback from your initial email, which ended with:
> TypeError: No object (name: translator) has been registered for this
> thread

I overlooked this one, but independently hit it too [1].

But this is orthogonal to the main issue you are stating
and which I am having troubles to figure out.

Could you please try following as a possible workaround?
At the particular node (or all within the cluster), try running
rm -f /var/lib/ricci/certs/clients/* and restarting ricci.
As a consequence, you may be prompted for the password to authenticate
against ricci instance(s) and/or re-add the cluster in luci even if
this was already done before.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=853151

Thanks,
Jan

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-20 Thread Chip Burke
Thanks for sticking with me on this. Here's the log:

https://dl.dropbox.com/u/8137282/strace.ricci.tgz

Chip Burke






On 8/20/12 10:28 AM, "Jan Pokorný"  wrote:

>Hello Chip,
>
>On 17/08/12 15:14 +, Chip Burke wrote:
>> Libvirt is not installed on any of the hosts.
>
>could you please provide a strace log (best as gzipped attachment sent
>off-list, or, you can use e.g. fpaste.org and provide a link if the
>log is not so huge).
>
>Something like (untested):
>
># strace -fp $(pidof ricci) -ff -o ricci
># tar czf ricci-strace.tar.gz ricci.*
>
>While the strace is running, please let luci or ccs access ricci
>so these attempts are covered well in thse strace log.
>
>Thanks,
>Jan
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-20 Thread Jan Pokorný
Hello Chip,

On 17/08/12 15:14 +, Chip Burke wrote:
> Libvirt is not installed on any of the hosts.

could you please provide a strace log (best as gzipped attachment sent
off-list, or, you can use e.g. fpaste.org and provide a link if the
log is not so huge).

Something like (untested):

# strace -fp $(pidof ricci) -ff -o ricci
# tar czf ricci-strace.tar.gz ricci.*

While the strace is running, please let luci or ccs access ricci
so these attempts are covered well in thse strace log.

Thanks,
Jan

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-17 Thread Chip Burke
Libvirt is not installed on any of the hosts.

Chip Burke







On 8/16/12 6:22 PM, "Chris Feist"  wrote:

>
>Can you try removing libvirt (rpm -e libvirt) and restarting ricci
>(/etc/init.d/ricci restart).
>
>And run that command again?
>


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Chris Feist

On 08/16/12 15:09, Chip Burke wrote:

time ccs -d -h xanadunode2 --getconf

Gives me

real2m7.027s
user0m0.053s
sys 0m0.013s


Can you try removing libvirt (rpm -e libvirt) and restarting ricci 
(/etc/init.d/ricci restart).


And run that command again?







So it sits for quite a while.

Chip Burke







On 8/16/12 3:52 PM, "Chris Feist"  wrote:


On 08/16/12 13:52, Chip Burke wrote:

Here we goŠ

node1:
ricci-0.16.2-55.el6.x86_64


node2:
ricci-0.16.2-55.el6.x86_64

[root@xanadunode2 ~]# service ricci stop
Shutting down ricci:   [  OK  ]
[root@xanadunode2 ~]# service ricci start
Starting ricci:[  OK  ]
[root@xanadunode2 ~]# ccs -d -h localhost --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
localhost password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response

Of course I fudged the password with XXXs for the list.


How long does it take for you to get the "Error: no ricci tag in ricci
response"?  Is it pretty quick or does it take around 30 seconds?




Thanks!

Chip Burke







On 8/16/12 12:30 PM, "Chris Feist"  wrote:


On 08/15/12 16:49, Chip Burke wrote:

There is nothing in messages or secure on either node1 or 2 at that
time.


Ok, there's something going on with the ricci authentication on that
node.  Can
you give me the output of 'rpm -q ricci' as well as do a
'/etc/init.d/ricci
restart'.

Then on the node that is running ricci, try this command:
ccs -d -h localhost --getconf

(it should ask your for a password, and enter the ricci password)

Thanks,
Chris



Chip Burke







On 8/15/12 4:50 PM, "Chris Feist"  wrote:


On 08/15/12 15:08, Chip Burke wrote:

modcluster-0.16.2-18.el6.x86_64

And

[root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
xanadunode2 password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response


Thanks, can you also provide what was in /var/log/messages and
/var/log/secure
when those errors occurred?



Thanks!


Chip Burke







On 8/15/12 3:56 PM, "Chris Feist"  wrote:


On 08/14/12 20:34, Chip Burke wrote:

[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

This gives me similar results. It sits and spins for a few minutes
and
then fails with:

Error: no ricci tag in ricci response

ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf


Can you send me the output of 'rpm -q modcluster' and
'ccs -d -h xanadunode2 --getconf'

Thanks,
Chris



This locks up everything going to GFS2 mounts. Two nodes
recovered,
the
other didn't, required a fence_node. GFS2 showed this before the
fence.

cd: /datastore/lvol0: Input/output error

Along with the error Error: no ricci tag in ricci response


Chip Burke






On 8/14/12 3:12 PM, "Chris Feist"  wrote:




Can you try using ccs to get the current configuration of that
node:
ccs -h  --getconf


As well as use ccs to try and set the conf on that node?
ccs -f  -h  --setconf

This should let us narrow down whether it's an issue with ricci
or
luci.

Thanks!
Chris




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Chip Burke
time ccs -d -h xanadunode2 --getconf

Gives me

real2m7.027s
user0m0.053s
sys 0m0.013s



So it sits for quite a while.

Chip Burke







On 8/16/12 3:52 PM, "Chris Feist"  wrote:

>On 08/16/12 13:52, Chip Burke wrote:
>> Here we goŠ
>>
>> node1:
>> ricci-0.16.2-55.el6.x86_64
>>
>>
>> node2:
>> ricci-0.16.2-55.el6.x86_64
>>
>> [root@xanadunode2 ~]# service ricci stop
>> Shutting down ricci:   [  OK  ]
>> [root@xanadunode2 ~]# service ricci start
>> Starting ricci:[  OK  ]
>> [root@xanadunode2 ~]# ccs -d -h localhost --getconf
>> ***Sending to ricci server:
>> >version="1.0">> name="cluster">> 
>>name="get_cluster.conf">>i>
>> ***Sending End
>> ***Received from ricci server
>> 
>> 
>>
>> ***Receive End
>> localhost password:
>> ***Sending to ricci server:
>> 
>> ***Sending End
>> ***Received from ricci server
>> 
>> 
>>
>> ***Receive End
>> Error: no ricci tag in ricci response
>>
>> Of course I fudged the password with XXXs for the list.
>
>How long does it take for you to get the "Error: no ricci tag in ricci
>response"?  Is it pretty quick or does it take around 30 seconds?
>
>>
>>
>> Thanks!
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>>
>> On 8/16/12 12:30 PM, "Chris Feist"  wrote:
>>
>>> On 08/15/12 16:49, Chip Burke wrote:
 There is nothing in messages or secure on either node1 or 2 at that
 time.
>>>
>>> Ok, there's something going on with the ricci authentication on that
>>> node.  Can
>>> you give me the output of 'rpm -q ricci' as well as do a
>>> '/etc/init.d/ricci
>>> restart'.
>>>
>>> Then on the node that is running ricci, try this command:
>>> ccs -d -h localhost --getconf
>>>
>>> (it should ask your for a password, and enter the ricci password)
>>>
>>> Thanks,
>>> Chris
>>>
 
 Chip Burke







 On 8/15/12 4:50 PM, "Chris Feist"  wrote:

> On 08/15/12 15:08, Chip Burke wrote:
>> modcluster-0.16.2-18.el6.x86_64
>>
>> And
>>
>> [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
>> ***Sending to ricci server:
>> > version="1.0">> name="cluster">>
>>
>> 
>>name="get_cluster.conf">>ri
>> cc
>> i>
>> ***Sending End
>> ***Received from ricci server
>> 
>> 
>>
>> ***Receive End
>> xanadunode2 password:
>> ***Sending to ricci server:
>> 
>> ***Sending End
>> ***Received from ricci server
>> 
>> 
>>
>> ***Receive End
>> Error: no ricci tag in ricci response
>
> Thanks, can you also provide what was in /var/log/messages and
> /var/log/secure
> when those errors occurred?
>
>>
>> Thanks!
>>
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>>
>> On 8/15/12 3:56 PM, "Chris Feist"  wrote:
>>
>>> On 08/14/12 20:34, Chip Burke wrote:
 [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

 This gives me similar results. It sits and spins for a few minutes
 and
 then fails with:

 Error: no ricci tag in ricci response

 ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf
>>>
>>> Can you send me the output of 'rpm -q modcluster' and
>>> 'ccs -d -h xanadunode2 --getconf'
>>>
>>> Thanks,
>>> Chris
>>>

 This locks up everything going to GFS2 mounts. Two nodes
recovered,
 the
 other didn't, required a fence_node. GFS2 showed this before the
 fence.

 cd: /datastore/lvol0: Input/output error

 Along with the error Error: no ricci tag in ricci response

 
 Chip Burke






 On 8/14/12 3:12 PM, "Chris Feist"  wrote:

>
>
> Can you try using ccs to get the current configuration of that
> node:
> ccs -h  --getconf
>
>
> As well as use ccs to try and set the conf on that node?
> ccs -f  -h  --setconf
>
> This should let us narrow down whether it's an issue with ricci
>or
> luci.
>
> Thanks!
> Chris
>


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>

Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Chris Feist

On 08/16/12 13:52, Chip Burke wrote:

Here we goŠ

node1:
ricci-0.16.2-55.el6.x86_64


node2:
ricci-0.16.2-55.el6.x86_64

[root@xanadunode2 ~]# service ricci stop
Shutting down ricci:   [  OK  ]
[root@xanadunode2 ~]# service ricci start
Starting ricci:[  OK  ]
[root@xanadunode2 ~]# ccs -d -h localhost --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
localhost password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response

Of course I fudged the password with XXXs for the list.


How long does it take for you to get the "Error: no ricci tag in ricci 
response"?  Is it pretty quick or does it take around 30 seconds?





Thanks!

Chip Burke







On 8/16/12 12:30 PM, "Chris Feist"  wrote:


On 08/15/12 16:49, Chip Burke wrote:

There is nothing in messages or secure on either node1 or 2 at that
time.


Ok, there's something going on with the ricci authentication on that
node.  Can
you give me the output of 'rpm -q ricci' as well as do a
'/etc/init.d/ricci
restart'.

Then on the node that is running ricci, try this command:
ccs -d -h localhost --getconf

(it should ask your for a password, and enter the ricci password)

Thanks,
Chris



Chip Burke







On 8/15/12 4:50 PM, "Chris Feist"  wrote:


On 08/15/12 15:08, Chip Burke wrote:

modcluster-0.16.2-18.el6.x86_64

And

[root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
xanadunode2 password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response


Thanks, can you also provide what was in /var/log/messages and
/var/log/secure
when those errors occurred?



Thanks!


Chip Burke







On 8/15/12 3:56 PM, "Chris Feist"  wrote:


On 08/14/12 20:34, Chip Burke wrote:

[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

This gives me similar results. It sits and spins for a few minutes
and
then fails with:

Error: no ricci tag in ricci response

ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf


Can you send me the output of 'rpm -q modcluster' and
'ccs -d -h xanadunode2 --getconf'

Thanks,
Chris



This locks up everything going to GFS2 mounts. Two nodes recovered,
the
other didn't, required a fence_node. GFS2 showed this before the
fence.

cd: /datastore/lvol0: Input/output error

Along with the error Error: no ricci tag in ricci response


Chip Burke






On 8/14/12 3:12 PM, "Chris Feist"  wrote:




Can you try using ccs to get the current configuration of that
node:
ccs -h  --getconf


As well as use ccs to try and set the conf on that node?
ccs -f  -h  --setconf

This should let us narrow down whether it's an issue with ricci or
luci.

Thanks!
Chris




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Chip Burke
Indeed. I get in via telnet between machines and get dumped out for not
using SSL, but that is to be expected.

Ps -A/top also show ricci happily sitting there listening.

Additionally

#netstat -lnptu | grep 1
tcp0  0 :::1:::*
 LISTEN  3210/ricci

So ricci is there.



Chip Burke







On 8/16/12 5:10 AM, "Fabio M. Di Nitto"  wrote:

>Maybe a stupid question..
>
>from node1:
>
>telnet node2 1
>
>do you get anything? are the iptables set correctly? (and check also
>from node2 to node1 and from the luci machine to both nodes)
>
>Fabio
>
>On 8/15/2012 11:49 PM, Chip Burke wrote:
>> There is nothing in messages or secure on either node1 or 2 at that
>>time.
>> 
>> Chip Burke
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 8/15/12 4:50 PM, "Chris Feist"  wrote:
>> 
>>> On 08/15/12 15:08, Chip Burke wrote:
 modcluster-0.16.2-18.el6.x86_64

 And

 [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
 ***Sending to ricci server:
 >>> version="1.0" name="cluster"
 
name="get_cluster.conf"cc
 i>
 ***Sending End
 ***Received from ricci server
 
 

 ***Receive End
 xanadunode2 password:
 ***Sending to ricci server:
 
 ***Sending End
 ***Received from ricci server
 
 

 ***Receive End
 Error: no ricci tag in ricci response
>>>
>>> Thanks, can you also provide what was in /var/log/messages and
>>> /var/log/secure
>>> when those errors occurred?
>>>

 Thanks!

 
 Chip Burke







 On 8/15/12 3:56 PM, "Chris Feist"  wrote:

> On 08/14/12 20:34, Chip Burke wrote:
>> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf
>>
>> This gives me similar results. It sits and spins for a few minutes
>>and
>> then fails with:
>>
>> Error: no ricci tag in ricci response
>>
>> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf
>
> Can you send me the output of 'rpm -q modcluster' and
> 'ccs -d -h xanadunode2 --getconf'
>
> Thanks,
> Chris
>
>>
>> This locks up everything going to GFS2 mounts. Two nodes recovered,
>> the
>> other didn't, required a fence_node. GFS2 showed this before the
>> fence.
>>
>> cd: /datastore/lvol0: Input/output error
>>
>> Along with the error Error: no ricci tag in ricci response
>>
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>> On 8/14/12 3:12 PM, "Chris Feist"  wrote:
>>
>>>
>>>
>>> Can you try using ccs to get the current configuration of that
>>>node:
>>> ccs -h  --getconf
>>>
>>>
>>> As well as use ccs to try and set the conf on that node?
>>> ccs -f  -h  --setconf
>>>
>>> This should let us narrow down whether it's an issue with ricci or
>>> luci.
>>>
>>> Thanks!
>>> Chris
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> 
>> 
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> 
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Chip Burke
Here we goŠ

node1:
ricci-0.16.2-55.el6.x86_64


node2:
ricci-0.16.2-55.el6.x86_64

[root@xanadunode2 ~]# service ricci stop
Shutting down ricci:   [  OK  ]
[root@xanadunode2 ~]# service ricci start
Starting ricci:[  OK  ]
[root@xanadunode2 ~]# ccs -d -h localhost --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
localhost password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response

Of course I fudged the password with XXXs for the list.


Thanks!

Chip Burke







On 8/16/12 12:30 PM, "Chris Feist"  wrote:

>On 08/15/12 16:49, Chip Burke wrote:
>> There is nothing in messages or secure on either node1 or 2 at that
>>time.
>
>Ok, there's something going on with the ricci authentication on that
>node.  Can 
>you give me the output of 'rpm -q ricci' as well as do a
>'/etc/init.d/ricci
>restart'.
>
>Then on the node that is running ricci, try this command:
>ccs -d -h localhost --getconf
>
>(it should ask your for a password, and enter the ricci password)
>
>Thanks,
>Chris
>
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>>
>> On 8/15/12 4:50 PM, "Chris Feist"  wrote:
>>
>>> On 08/15/12 15:08, Chip Burke wrote:
 modcluster-0.16.2-18.el6.x86_64

 And

 [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
 ***Sending to ricci server:
 >>> version="1.0" name="cluster"
 
name="get_cluster.conf"cc
 i>
 ***Sending End
 ***Received from ricci server
 
 

 ***Receive End
 xanadunode2 password:
 ***Sending to ricci server:
 
 ***Sending End
 ***Received from ricci server
 
 

 ***Receive End
 Error: no ricci tag in ricci response
>>>
>>> Thanks, can you also provide what was in /var/log/messages and
>>> /var/log/secure
>>> when those errors occurred?
>>>

 Thanks!

 
 Chip Burke







 On 8/15/12 3:56 PM, "Chris Feist"  wrote:

> On 08/14/12 20:34, Chip Burke wrote:
>> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf
>>
>> This gives me similar results. It sits and spins for a few minutes
>>and
>> then fails with:
>>
>> Error: no ricci tag in ricci response
>>
>> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf
>
> Can you send me the output of 'rpm -q modcluster' and
> 'ccs -d -h xanadunode2 --getconf'
>
> Thanks,
> Chris
>
>>
>> This locks up everything going to GFS2 mounts. Two nodes recovered,
>> the
>> other didn't, required a fence_node. GFS2 showed this before the
>> fence.
>>
>> cd: /datastore/lvol0: Input/output error
>>
>> Along with the error Error: no ricci tag in ricci response
>>
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>> On 8/14/12 3:12 PM, "Chris Feist"  wrote:
>>
>>>
>>>
>>> Can you try using ccs to get the current configuration of that
>>>node:
>>> ccs -h  --getconf
>>>
>>>
>>> As well as use ccs to try and set the conf on that node?
>>> ccs -f  -h  --setconf
>>>
>>> This should let us narrow down whether it's an issue with ricci or
>>> luci.
>>>
>>> Thanks!
>>> Chris
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Chris Feist

On 08/15/12 16:49, Chip Burke wrote:

There is nothing in messages or secure on either node1 or 2 at that time.


Ok, there's something going on with the ricci authentication on that node.  Can 
you give me the output of 'rpm -q ricci' as well as do a '/etc/init.d/ricci 
restart'.


Then on the node that is running ricci, try this command:
ccs -d -h localhost --getconf

(it should ask your for a password, and enter the ricci password)

Thanks,
Chris



Chip Burke







On 8/15/12 4:50 PM, "Chris Feist"  wrote:


On 08/15/12 15:08, Chip Burke wrote:

modcluster-0.16.2-18.el6.x86_64

And

[root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
xanadunode2 password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response


Thanks, can you also provide what was in /var/log/messages and
/var/log/secure
when those errors occurred?



Thanks!


Chip Burke







On 8/15/12 3:56 PM, "Chris Feist"  wrote:


On 08/14/12 20:34, Chip Burke wrote:

[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

This gives me similar results. It sits and spins for a few minutes and
then fails with:

Error: no ricci tag in ricci response

ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf


Can you send me the output of 'rpm -q modcluster' and
'ccs -d -h xanadunode2 --getconf'

Thanks,
Chris



This locks up everything going to GFS2 mounts. Two nodes recovered,
the
other didn't, required a fence_node. GFS2 showed this before the
fence.

cd: /datastore/lvol0: Input/output error

Along with the error Error: no ricci tag in ricci response


Chip Burke






On 8/14/12 3:12 PM, "Chris Feist"  wrote:




Can you try using ccs to get the current configuration of that node:
ccs -h  --getconf


As well as use ccs to try and set the conf on that node?
ccs -f  -h  --setconf

This should let us narrow down whether it's an issue with ricci or
luci.

Thanks!
Chris




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-16 Thread Fabio M. Di Nitto
Maybe a stupid question..

from node1:

telnet node2 1

do you get anything? are the iptables set correctly? (and check also
from node2 to node1 and from the luci machine to both nodes)

Fabio

On 8/15/2012 11:49 PM, Chip Burke wrote:
> There is nothing in messages or secure on either node1 or 2 at that time.
> 
> Chip Burke
> 
> 
> 
> 
> 
> 
> 
> On 8/15/12 4:50 PM, "Chris Feist"  wrote:
> 
>> On 08/15/12 15:08, Chip Burke wrote:
>>> modcluster-0.16.2-18.el6.x86_64
>>>
>>> And
>>>
>>> [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
>>> ***Sending to ricci server:
>>> >> version="1.0">>> name="cluster">>>
>>> name="get_cluster.conf">>> i>
>>> ***Sending End
>>> ***Received from ricci server
>>> 
>>> 
>>>
>>> ***Receive End
>>> xanadunode2 password:
>>> ***Sending to ricci server:
>>> 
>>> ***Sending End
>>> ***Received from ricci server
>>> 
>>> 
>>>
>>> ***Receive End
>>> Error: no ricci tag in ricci response
>>
>> Thanks, can you also provide what was in /var/log/messages and
>> /var/log/secure 
>> when those errors occurred?
>>
>>>
>>> Thanks!
>>>
>>> 
>>> Chip Burke
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 8/15/12 3:56 PM, "Chris Feist"  wrote:
>>>
 On 08/14/12 20:34, Chip Burke wrote:
> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf
>
> This gives me similar results. It sits and spins for a few minutes and
> then fails with:
>
> Error: no ricci tag in ricci response
>
> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf

 Can you send me the output of 'rpm -q modcluster' and
 'ccs -d -h xanadunode2 --getconf'

 Thanks,
 Chris

>
> This locks up everything going to GFS2 mounts. Two nodes recovered,
> the
> other didn't, required a fence_node. GFS2 showed this before the
> fence.
>
> cd: /datastore/lvol0: Input/output error
>
> Along with the error Error: no ricci tag in ricci response
>
> 
> Chip Burke
>
>
>
>
>
>
> On 8/14/12 3:12 PM, "Chris Feist"  wrote:
>
>>
>>
>> Can you try using ccs to get the current configuration of that node:
>> ccs -h  --getconf
>>
>>
>> As well as use ccs to try and set the conf on that node?
>> ccs -f  -h  --setconf
>>
>> This should let us narrow down whether it's an issue with ricci or
>> luci.
>>
>> Thanks!
>> Chris
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-15 Thread Chip Burke
There is nothing in messages or secure on either node1 or 2 at that time.

Chip Burke







On 8/15/12 4:50 PM, "Chris Feist"  wrote:

>On 08/15/12 15:08, Chip Burke wrote:
>> modcluster-0.16.2-18.el6.x86_64
>>
>> And
>>
>> [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
>> ***Sending to ricci server:
>> >version="1.0">> name="cluster">> 
>>name="get_cluster.conf">>i>
>> ***Sending End
>> ***Received from ricci server
>> 
>> 
>>
>> ***Receive End
>> xanadunode2 password:
>> ***Sending to ricci server:
>> 
>> ***Sending End
>> ***Received from ricci server
>> 
>> 
>>
>> ***Receive End
>> Error: no ricci tag in ricci response
>
>Thanks, can you also provide what was in /var/log/messages and
>/var/log/secure 
>when those errors occurred?
>
>>
>> Thanks!
>>
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>>
>> On 8/15/12 3:56 PM, "Chris Feist"  wrote:
>>
>>> On 08/14/12 20:34, Chip Burke wrote:
 [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

 This gives me similar results. It sits and spins for a few minutes and
 then fails with:

 Error: no ricci tag in ricci response

 ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf
>>>
>>> Can you send me the output of 'rpm -q modcluster' and
>>> 'ccs -d -h xanadunode2 --getconf'
>>>
>>> Thanks,
>>> Chris
>>>

 This locks up everything going to GFS2 mounts. Two nodes recovered,
the
 other didn't, required a fence_node. GFS2 showed this before the
fence.

 cd: /datastore/lvol0: Input/output error

 Along with the error Error: no ricci tag in ricci response

 
 Chip Burke






 On 8/14/12 3:12 PM, "Chris Feist"  wrote:

>
>
> Can you try using ccs to get the current configuration of that node:
> ccs -h  --getconf
>
>
> As well as use ccs to try and set the conf on that node?
> ccs -f  -h  --setconf
>
> This should let us narrow down whether it's an issue with ricci or
> luci.
>
> Thanks!
> Chris
>


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-15 Thread Chris Feist

On 08/15/12 15:08, Chip Burke wrote:

modcluster-0.16.2-18.el6.x86_64

And

[root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
xanadunode2 password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response


Thanks, can you also provide what was in /var/log/messages and /var/log/secure 
when those errors occurred?




Thanks!


Chip Burke







On 8/15/12 3:56 PM, "Chris Feist"  wrote:


On 08/14/12 20:34, Chip Burke wrote:

[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

This gives me similar results. It sits and spins for a few minutes and
then fails with:

Error: no ricci tag in ricci response

ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf


Can you send me the output of 'rpm -q modcluster' and
'ccs -d -h xanadunode2 --getconf'

Thanks,
Chris



This locks up everything going to GFS2 mounts. Two nodes recovered, the
other didn't, required a fence_node. GFS2 showed this before the fence.

cd: /datastore/lvol0: Input/output error

Along with the error Error: no ricci tag in ricci response


Chip Burke






On 8/14/12 3:12 PM, "Chris Feist"  wrote:




Can you try using ccs to get the current configuration of that node:
ccs -h  --getconf


As well as use ccs to try and set the conf on that node?
ccs -f  -h  --setconf

This should let us narrow down whether it's an issue with ricci or
luci.

Thanks!
Chris




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-15 Thread Chip Burke
modcluster-0.16.2-18.el6.x86_64

And

[root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
xanadunode2 password:
***Sending to ricci server:

***Sending End
***Received from ricci server



***Receive End
Error: no ricci tag in ricci response

Thanks!


Chip Burke







On 8/15/12 3:56 PM, "Chris Feist"  wrote:

>On 08/14/12 20:34, Chip Burke wrote:
>> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf
>>
>> This gives me similar results. It sits and spins for a few minutes and
>> then fails with:
>>
>> Error: no ricci tag in ricci response
>>
>> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf
>
>Can you send me the output of 'rpm -q modcluster' and
>'ccs -d -h xanadunode2 --getconf'
>
>Thanks,
>Chris
>
>>
>> This locks up everything going to GFS2 mounts. Two nodes recovered, the
>> other didn't, required a fence_node. GFS2 showed this before the fence.
>>
>> cd: /datastore/lvol0: Input/output error
>>
>> Along with the error Error: no ricci tag in ricci response
>>
>> 
>> Chip Burke
>>
>>
>>
>>
>>
>>
>> On 8/14/12 3:12 PM, "Chris Feist"  wrote:
>>
>>>
>>>
>>> Can you try using ccs to get the current configuration of that node:
>>> ccs -h  --getconf
>>>
>>>
>>> As well as use ccs to try and set the conf on that node?
>>> ccs -f  -h  --setconf
>>>
>>> This should let us narrow down whether it's an issue with ricci or
>>>luci.
>>>
>>> Thanks!
>>> Chris
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster@redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-15 Thread Chris Feist

On 08/14/12 20:34, Chip Burke wrote:

[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

This gives me similar results. It sits and spins for a few minutes and
then fails with:

Error: no ricci tag in ricci response

ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf


Can you send me the output of 'rpm -q modcluster' and
'ccs -d -h xanadunode2 --getconf'

Thanks,
Chris



This locks up everything going to GFS2 mounts. Two nodes recovered, the
other didn't, required a fence_node. GFS2 showed this before the fence.

cd: /datastore/lvol0: Input/output error

Along with the error Error: no ricci tag in ricci response


Chip Burke






On 8/14/12 3:12 PM, "Chris Feist"  wrote:




Can you try using ccs to get the current configuration of that node:
ccs -h  --getconf


As well as use ccs to try and set the conf on that node?
ccs -f  -h  --setconf

This should let us narrow down whether it's an issue with ricci or luci.

Thanks!
Chris




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-14 Thread Chip Burke
[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf

This gives me similar results. It sits and spins for a few minutes and
then fails with:

Error: no ricci tag in ricci response

ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf

This locks up everything going to GFS2 mounts. Two nodes recovered, the
other didn't, required a fence_node. GFS2 showed this before the fence.

cd: /datastore/lvol0: Input/output error

Along with the error Error: no ricci tag in ricci response


Chip Burke






On 8/14/12 3:12 PM, "Chris Feist"  wrote:

>
>
>Can you try using ccs to get the current configuration of that node:
>ccs -h  --getconf
>
>
>As well as use ccs to try and set the conf on that node?
>ccs -f  -h  --setconf
>
>This should let us narrow down whether it's an issue with ricci or luci.
>
>Thanks!
>Chris
>


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-14 Thread Chris Feist

On 08/13/12 16:38, Chip Burke wrote:

Ricci is seemingly not working through either Luci nor cman_tool. There does't
seem to be a lot of logging to go off of (at least I haven't found it) but what
I did find in the Luci log is as follows:

15:51:06,793 ERROR [luci.lib.ricci_communicator] Error receiving header from
node2.domain.local:1
Traceback (most recent call last):
   File "/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py",
line 121, in __init__
 hello = self.__receive(self.__timeout_init)
   File "/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py",
line 503, in __receive
 errstr = _('Error reading from %s:%d: %s') \
   File "/usr/lib/python2.6/site-packages/pylons/i18n/translation.py", line 106,
in ugettext
 return pylons.translator.ugettext(value)
   File "/usr/lib/python2.6/site-packages/paste/registry.py", line 137, in
__getattr__
 return getattr(self._current_obj(), attr)
   File "/usr/lib/python2.6/site-packages/paste/registry.py", line 197, in
_current_obj
 'thread' % self.name__)
TypeError: No object (name: translator) has been registered for this thread
15:51:06,793 ERROR [luci.lib.ricci_helpers] Error receiving header from
node2..local:1
15:51:06,793 ERROR [luci.lib.ricci_helpers] Error retrieving batch number from
node3.X.local: Error receiving header from node3.X.local:1

Cluster config I am trying to push:


































Running config:

































Any ideas? SCP and reboots are fun and all, but I would love Ricci to work.


Can you try using ccs to get the current configuration of that node:
ccs -h  --getconf


As well as use ccs to try and set the conf on that node?
ccs -f  -h  --setconf

This should let us narrow down whether it's an issue with ricci or luci.

Thanks!
Chris



Thanks!






--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-14 Thread Chip Burke
CentOS 6.2, cman version 3.0.12.1

As for the syslog, I have nothing on other the node sending or the nodes
receiving. There is no output at all. And after troubleshooting further,

#cman_tool version -r -S

Works if I do a manual push via scp, so it must be getting hung up on the
file transfer. Running ccs_sync manually doesn't give me any different
results.  After a minute or two things fail with the output:

The connection to node1 died unexpectedly
The connection to node3 died unexpectedly
The connection to node2 died unexpectedly


And still nothing in the syslog. I also checked the secure log to see if
authentication was bombing out, but there was nothing conclusive there
either.






On 8/13/12 7:58 PM, "Digimer"  wrote:

>
>What OS and cluster versions? What is in syslog on the three nodes when
>this occurs?
>
>-- 
>Digimer
>Papers and Projects: https://alteeve.com


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Ricci doesn't work

2012-08-13 Thread Digimer

On 08/13/2012 05:38 PM, Chip Burke wrote:

Ricci is seemingly not working through either Luci nor cman_tool. There
does't seem to be a lot of logging to go off of (at least I haven't
found it) but what I did find in the Luci log is as follows:

15:51:06,793 ERROR [luci.lib.ricci_communicator] Error receiving header
from node2.domain.local:1
Traceback (most recent call last):
   File
"/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py",
line 121, in __init__
 hello = self.__receive(self.__timeout_init)
   File
"/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py",
line 503, in __receive
 errstr = _('Error reading from %s:%d: %s') \
   File "/usr/lib/python2.6/site-packages/pylons/i18n/translation.py",
line 106, in ugettext
 return pylons.translator.ugettext(value)
   File "/usr/lib/python2.6/site-packages/paste/registry.py", line 137,
in __getattr__
 return getattr(self._current_obj(), attr)
   File "/usr/lib/python2.6/site-packages/paste/registry.py", line 197,
in _current_obj
 'thread' % self.name__)
TypeError: No object (name: translator) has been registered for this thread
15:51:06,793 ERROR [luci.lib.ricci_helpers] Error receiving header from
node2..local:1
15:51:06,793 ERROR [luci.lib.ricci_helpers] Error retrieving batch
number from node3.X.local: Error receiving header from
node3.X.local:1

Cluster config I am trying to push:


































Running config:

































Any ideas? SCP and reboots are fun and all, but I would love Ricci to work.

Thanks!


What OS and cluster versions? What is in syslog on the three nodes when 
this occurs?


--
Digimer
Papers and Projects: https://alteeve.com

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster