Re: [Linux-cluster] Ricci doesn't work
On 12/09/12 14:00 +0200, Jan Pokorný wrote: > cat
Re: [Linux-cluster] Ricci doesn't work
On 10/09/12 21:54 +, Chip Burke wrote: > Well, after a few days of testing it seems luci/ricci is still flakey. At first, there seems to be more than "XML entities" problem involved. This is because luci, unlike ccs and ccs_sync ("cman_tool version" uses ccs_sync under the hood) should not suffer from that one. So probably yet another separate issue... > If I update things via luci, the cluster.conf on the local machine with > luci updates, but nothing pushes out to other nodes. If I then manually run > ccs_sync on the node with the new configuration, things push out to the > other nodes fine. While this is wonky, at least it is consistent and > repeatable so I can do things in this manner until fixes are in. Though, > certainly let me know if you want any more logs or whatever from me. To get a better error message in luci.log, something to start with, could you apply a workaround for that "no translator" issue, please? The recipe, based on the real patch to fix the mentioned bug, is as follows (as root on the host with luci installed): --- pushd $(rpm --eval %python_sitearch) cat
Re: [Linux-cluster] Ricci doesn't work
Well, after a few days of testing it seems luci/ricci is still flakey. If I update things via luci, the cluster.conf on the local machine with luci updates, but nothing pushes out to other nodes. If I then manually run ccs_sync on the node with the new configuration, things push out to the other nodes fine. While this is wonky, at least it is consistent and repeatable so I can do things in this manner until fixes are in. Though, certainly let me know if you want any more logs or whatever from me. Chip Burke -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 06/09/12 18:44 +, Chip Burke wrote: > I only eliminated the second & and that was all it took. My guess is this > is some edge case where the two &s in the string made it break where as a > single & did not cause things to escape or what have you. Oh, I now see what is going on. Once you were successful with alpha-only password, the authentication is now based merely on the certificates (no longer a password is involved, no matter if it gets changed or not). Hence, the XML-unsafe character in the password has no chance to puzzle ricci as in the discussed case where the initial password authentication was yet to be done. Anyway, the mentioned bugs are here to allow using even these problematic characters safely. > All that said, thanks again! You're welcome. -- Jan -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
I only eliminated the second & and that was all it took. My guess is this is some edge case where the two &s in the string made it break where as a single & did not cause things to escape or what have you. All that said, thanks again! Chip Burke On 9/6/12 2:32 PM, "Jan Pokorný" wrote: >On 06/09/12 15:11 +, Chip Burke wrote: >> Well that was an easy enough fix finally. I thought perhaps the password >> for the VMWare fence account was the issue and updated cluster.conf >>with a >> place holder password of 'password'. Ricci would not work. So I updated >> the actual ricci user account to use a password of 'password' and >> restarted Ricci on all of the nodes. Ricci now works. So indeed, it >> certainly did not like a character in the password I was using which was >> 65peC&E$taFRE&U. In all likelihood the & was the problem character. > >Bull's eye. > >> On 9/6/12 6:11 AM, "Jan Pokorný" wrote: >>> The easiest explanation is that this XML is not well-formed, which >>> would boil down to your obfuscated password (not offending it, >>> it's highly reasonable). Did you password contain any XML-nonfriendly >>> character, such as one of '<>"&'? If so, could you please try digits, >>> ASCII letters and surely-safe characters only (dot, dash, etc.)? > >Admittedly, this obstacle should be easier to track down, if allowed >to exist at all (see bellow). > >> To confirm that hypothesis, I changed the Ricci password to >>65peC&E$taFREU >> and everything still worked as expected. > >Once at it, it should have been "65peCE$taFREU" (no & char), shouldn't >it? > >> From your stand point I don't know if that needs to be coded around or >>what, >> but at least we know how to reproduce the issue. > >Thanks for bringing up this part we should be more careful about. >As a starter, I filed these bugs: > >- ricci (needs to understand the XML entities properly) > https://bugzilla.redhat.com/show_bug.cgi?id=855121 > >(clients need to do a proper encoding into XML entities) >- luci: https://bugzilla.redhat.com/show_bug.cgi?id=855112 >- ccs: https://bugzilla.redhat.com/show_bug.cgi?id=855117 >- ccs_sync: https://bugzilla.redhat.com/show_bug.cgi?id=855120 > >Also based on studying some relevant parts of the ricci's code, >I've added a few private suggestions under the umbrella of bug 849233. > >> Thanks again for sticking with me on this even if the cause was somewhat >> silly. > >To be fair enough, no matter how unprobable the reason of not working >correctly is (let's keep complex configuration errors aside), one >can expect such things self-evident (via the messages, logs, etc.), >not as an exercise left to the user and indirectly back to the >maintainer :-) > >Thanks, >Jan > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 06/09/12 15:11 +, Chip Burke wrote: > Well that was an easy enough fix finally. I thought perhaps the password > for the VMWare fence account was the issue and updated cluster.conf with a > place holder password of 'password'. Ricci would not work. So I updated > the actual ricci user account to use a password of 'password' and > restarted Ricci on all of the nodes. Ricci now works. So indeed, it > certainly did not like a character in the password I was using which was > 65peC&E$taFRE&U. In all likelihood the & was the problem character. Bull's eye. > On 9/6/12 6:11 AM, "Jan Pokorný" wrote: >> The easiest explanation is that this XML is not well-formed, which >> would boil down to your obfuscated password (not offending it, >> it's highly reasonable). Did you password contain any XML-nonfriendly >> character, such as one of '<>"&'? If so, could you please try digits, >> ASCII letters and surely-safe characters only (dot, dash, etc.)? Admittedly, this obstacle should be easier to track down, if allowed to exist at all (see bellow). > To confirm that hypothesis, I changed the Ricci password to 65peC&E$taFREU > and everything still worked as expected. Once at it, it should have been "65peCE$taFREU" (no & char), shouldn't it? > From your stand point I don't know if that needs to be coded around or what, > but at least we know how to reproduce the issue. Thanks for bringing up this part we should be more careful about. As a starter, I filed these bugs: - ricci (needs to understand the XML entities properly) https://bugzilla.redhat.com/show_bug.cgi?id=855121 (clients need to do a proper encoding into XML entities) - luci: https://bugzilla.redhat.com/show_bug.cgi?id=855112 - ccs: https://bugzilla.redhat.com/show_bug.cgi?id=855117 - ccs_sync: https://bugzilla.redhat.com/show_bug.cgi?id=855120 Also based on studying some relevant parts of the ricci's code, I've added a few private suggestions under the umbrella of bug 849233. > Thanks again for sticking with me on this even if the cause was somewhat > silly. To be fair enough, no matter how unprobable the reason of not working correctly is (let's keep complex configuration errors aside), one can expect such things self-evident (via the messages, logs, etc.), not as an exercise left to the user and indirectly back to the maintainer :-) Thanks, Jan -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Well that was an easy enough fix finally. I thought perhaps the password for the VMWare fence account was the issue and updated cluster.conf with a place holder password of 'password'. Ricci would not work. So I updated the actual ricci user account to use a password of 'password' and restarted Ricci on all of the nodes. Ricci now works. So indeed, it certainly did not like a character in the password I was using which was 65peC&E$taFRE&U. In all likelihood the & was the problem character. To confirm that hypothesis, I changed the Ricci password to 65peC&E$taFREU and everything still worked as expected. So there is our answer. From your stand point I don't know if that needs to be coded around or what, but at least we know how to reproduce the issue. Thanks again for sticking with me on this even if the cause was somewhat silly. Chip Burke On 9/6/12 6:11 AM, "Jan Pokorný" wrote: >On 05/09/12 16:21 +, Chip Burke wrote: >> This gives me the same behavior. > >Sorry, I can now see it was a bad workaround guess from the beginning. > >I think the strace logs you provided contain good-enough information >about the issue and still scratching my head. > >Part of it is that there are two sub-issues and I am not sure if >they are isolated or there is a causality relationship. > > >The first one is hidden and very probably innocent -- the one present >in ricci.strace.5573 -- EPIPE/SIGPIPE. First, there is an extra empty >read because (I think) the client of ricci has shutdown the connection >first (seems like ungraceful way of disconnection, but still tolerable), >but ricci side, despite this fact, is trying to send closure notify >message so as to achieve expected graceful disconnection. >Apparently, this fails in this case, accompanied by EPIPE/SIGPIPE. > > >However, the second is one -- unability to proceed the request, failing >upon timeout as can be seen in ricci.strace.5575 -- is severe. >> read(5, "\27\3\1\0p", 5)= 5 >> read(5, >>"\373\303\16\20>\202%\34\211\214b\\l\260\354\3662\312\272\21<\t\r\235S\31 >>o\361\21\265\266p"..., 112) = 112 >Here the two trailing bytes out of first five (0x00 0x70) indicates the >whole size of the message that is indeed read as expected (112). >Ricci should *not* keep trying to read pass this point as the whole XML >message should have been received at this moment. But for some >reason it does (see subsequent poll with POLLIN flag). > >The easiest explanation is that this XML is not well-formed, which >would boil down to your obfuscated password (not offending it, >it's highly reasonable). Did you password contain any XML-nonfriendly >character, such as one of '<>"&'? If so, could you please try digits, >ASCII letters and surely-safe characters only (dot, dash, etc.)? > > >As outlined, these two issues can be even interconnected (having >OpenSSL error queue at the main thread, which does not get cleared >explictly as it probably should, in mind). I am going to look more >into it (perhaps put together simple client for you to try) after >knowing your situation with the password. > >If there is nothing suspicious about your password to authenticate >against ricci, the inverse of previously suggested workaround could >be tried (manually pre-authenticating ccs against ricci); >from the host ccs is being run at, something along the lines: > >$ ccs ... # if ~/.ccs/cacert.pem does not exist yet >$ RICCIHOST=machina >$ RICCICERT=$(mktemp -u /var/lib/ricci/certs/clients/client_cert_XX) >$ scp ~/.ccs/cacert.pem root@$RICCIHOST:$RICCICERT >$ ssh root@$RICCIHOST chown ricci:root $RICCICERT >$ ssh root@$RICCIHOST restorecon $RICCICERT # if using SELinux >$ ssh root@$RICCIHOST service ricci restart >$ ccs ... > >Thanks, >Jan > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 05/09/12 16:21 +, Chip Burke wrote: > This gives me the same behavior. Sorry, I can now see it was a bad workaround guess from the beginning. I think the strace logs you provided contain good-enough information about the issue and still scratching my head. Part of it is that there are two sub-issues and I am not sure if they are isolated or there is a causality relationship. The first one is hidden and very probably innocent -- the one present in ricci.strace.5573 -- EPIPE/SIGPIPE. First, there is an extra empty read because (I think) the client of ricci has shutdown the connection first (seems like ungraceful way of disconnection, but still tolerable), but ricci side, despite this fact, is trying to send closure notify message so as to achieve expected graceful disconnection. Apparently, this fails in this case, accompanied by EPIPE/SIGPIPE. However, the second is one -- unability to proceed the request, failing upon timeout as can be seen in ricci.strace.5575 -- is severe. > read(5, "\27\3\1\0p", 5)= 5 > read(5, > "\373\303\16\20>\202%\34\211\214b\\l\260\354\3662\312\272\21<\t\r\235S\31o\361\21\265\266p"..., > 112) = 112 Here the two trailing bytes out of first five (0x00 0x70) indicates the whole size of the message that is indeed read as expected (112). Ricci should *not* keep trying to read pass this point as the whole XML message should have been received at this moment. But for some reason it does (see subsequent poll with POLLIN flag). The easiest explanation is that this XML is not well-formed, which would boil down to your obfuscated password (not offending it, it's highly reasonable). Did you password contain any XML-nonfriendly character, such as one of '<>"&'? If so, could you please try digits, ASCII letters and surely-safe characters only (dot, dash, etc.)? As outlined, these two issues can be even interconnected (having OpenSSL error queue at the main thread, which does not get cleared explictly as it probably should, in mind). I am going to look more into it (perhaps put together simple client for you to try) after knowing your situation with the password. If there is nothing suspicious about your password to authenticate against ricci, the inverse of previously suggested workaround could be tried (manually pre-authenticating ccs against ricci); from the host ccs is being run at, something along the lines: $ ccs ... # if ~/.ccs/cacert.pem does not exist yet $ RICCIHOST=machina $ RICCICERT=$(mktemp -u /var/lib/ricci/certs/clients/client_cert_XX) $ scp ~/.ccs/cacert.pem root@$RICCIHOST:$RICCICERT $ ssh root@$RICCIHOST chown ricci:root $RICCICERT $ ssh root@$RICCIHOST restorecon $RICCICERT # if using SELinux $ ssh root@$RICCIHOST service ricci restart $ ccs ... Thanks, Jan -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
This gives me the same behavior. On each node I ran: #rm -f /var/lib/ricci/certs/clients/* #service ricci restart Then from one node I simply incremented the version in cluster.conf and ran # cman_tool version -r You have not authenticated to the ricci daemon on xanadunode1 Password: You have not authenticated to the ricci daemon on xanadunode2 Password: You have not authenticated to the ricci daemon on xanadunode3 Password: The connection to xanadunode1 died unexpectedly The connection to xanadunode2 died unexpectedly The connection to xanadunode3 died unexpectedly cman_tool: ccs_sync failed. If you have distributed the config file yourself, try re-running with -S Let me know if there is any other info I can give you. Thanks again! Chip Burke On 9/5/12 9:34 AM, "Jan Pokorný" wrote: >On 20/08/12 14:51 +, Chip Burke wrote: >> Thanks for sticking with me on this. > >Sorry for delay. > >The traceback from your initial email, which ended with: >> TypeError: No object (name: translator) has been registered for this >> thread > >I overlooked this one, but independently hit it too [1]. > >But this is orthogonal to the main issue you are stating >and which I am having troubles to figure out. > >Could you please try following as a possible workaround? >At the particular node (or all within the cluster), try running >rm -f /var/lib/ricci/certs/clients/* and restarting ricci. >As a consequence, you may be prompted for the password to authenticate >against ricci instance(s) and/or re-add the cluster in luci even if >this was already done before. > >[1] https://bugzilla.redhat.com/show_bug.cgi?id=853151 > >Thanks, >Jan > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 20/08/12 14:51 +, Chip Burke wrote: > Thanks for sticking with me on this. Sorry for delay. The traceback from your initial email, which ended with: > TypeError: No object (name: translator) has been registered for this > thread I overlooked this one, but independently hit it too [1]. But this is orthogonal to the main issue you are stating and which I am having troubles to figure out. Could you please try following as a possible workaround? At the particular node (or all within the cluster), try running rm -f /var/lib/ricci/certs/clients/* and restarting ricci. As a consequence, you may be prompted for the password to authenticate against ricci instance(s) and/or re-add the cluster in luci even if this was already done before. [1] https://bugzilla.redhat.com/show_bug.cgi?id=853151 Thanks, Jan -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Thanks for sticking with me on this. Here's the log: https://dl.dropbox.com/u/8137282/strace.ricci.tgz Chip Burke On 8/20/12 10:28 AM, "Jan Pokorný" wrote: >Hello Chip, > >On 17/08/12 15:14 +, Chip Burke wrote: >> Libvirt is not installed on any of the hosts. > >could you please provide a strace log (best as gzipped attachment sent >off-list, or, you can use e.g. fpaste.org and provide a link if the >log is not so huge). > >Something like (untested): > ># strace -fp $(pidof ricci) -ff -o ricci ># tar czf ricci-strace.tar.gz ricci.* > >While the strace is running, please let luci or ccs access ricci >so these attempts are covered well in thse strace log. > >Thanks, >Jan > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Hello Chip, On 17/08/12 15:14 +, Chip Burke wrote: > Libvirt is not installed on any of the hosts. could you please provide a strace log (best as gzipped attachment sent off-list, or, you can use e.g. fpaste.org and provide a link if the log is not so huge). Something like (untested): # strace -fp $(pidof ricci) -ff -o ricci # tar czf ricci-strace.tar.gz ricci.* While the strace is running, please let luci or ccs access ricci so these attempts are covered well in thse strace log. Thanks, Jan -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Libvirt is not installed on any of the hosts. Chip Burke On 8/16/12 6:22 PM, "Chris Feist" wrote: > >Can you try removing libvirt (rpm -e libvirt) and restarting ricci >(/etc/init.d/ricci restart). > >And run that command again? > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 08/16/12 15:09, Chip Burke wrote: time ccs -d -h xanadunode2 --getconf Gives me real2m7.027s user0m0.053s sys 0m0.013s Can you try removing libvirt (rpm -e libvirt) and restarting ricci (/etc/init.d/ricci restart). And run that command again? So it sits for quite a while. Chip Burke On 8/16/12 3:52 PM, "Chris Feist" wrote: On 08/16/12 13:52, Chip Burke wrote: Here we goŠ node1: ricci-0.16.2-55.el6.x86_64 node2: ricci-0.16.2-55.el6.x86_64 [root@xanadunode2 ~]# service ricci stop Shutting down ricci: [ OK ] [root@xanadunode2 ~]# service ricci start Starting ricci:[ OK ] [root@xanadunode2 ~]# ccs -d -h localhost --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End localhost password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Of course I fudged the password with XXXs for the list. How long does it take for you to get the "Error: no ricci tag in ricci response"? Is it pretty quick or does it take around 30 seconds? Thanks! Chip Burke On 8/16/12 12:30 PM, "Chris Feist" wrote: On 08/15/12 16:49, Chip Burke wrote: There is nothing in messages or secure on either node1 or 2 at that time. Ok, there's something going on with the ricci authentication on that node. Can you give me the output of 'rpm -q ricci' as well as do a '/etc/init.d/ricci restart'. Then on the node that is running ricci, try this command: ccs -d -h localhost --getconf (it should ask your for a password, and enter the ricci password) Thanks, Chris Chip Burke On 8/15/12 4:50 PM, "Chris Feist" wrote: On 08/15/12 15:08, Chip Burke wrote: modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Thanks, can you also provide what was in /var/log/messages and /var/log/secure when those errors occurred? Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf Can you send me the output of 'rpm -q modcluster' and 'ccs -d -h xanadunode2 --getconf' Thanks, Chris This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: Can you try using ccs to get the current configuration of that node: ccs -h --getconf As well as use ccs to try and set the conf on that node? ccs -f -h --setconf This should let us narrow down whether it's an issue with ricci or luci. Thanks! Chris -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
time ccs -d -h xanadunode2 --getconf Gives me real2m7.027s user0m0.053s sys 0m0.013s So it sits for quite a while. Chip Burke On 8/16/12 3:52 PM, "Chris Feist" wrote: >On 08/16/12 13:52, Chip Burke wrote: >> Here we goŠ >> >> node1: >> ricci-0.16.2-55.el6.x86_64 >> >> >> node2: >> ricci-0.16.2-55.el6.x86_64 >> >> [root@xanadunode2 ~]# service ricci stop >> Shutting down ricci: [ OK ] >> [root@xanadunode2 ~]# service ricci start >> Starting ricci:[ OK ] >> [root@xanadunode2 ~]# ccs -d -h localhost --getconf >> ***Sending to ricci server: >> >version="1.0">> name="cluster">> >>name="get_cluster.conf">>i> >> ***Sending End >> ***Received from ricci server >> >> >> >> ***Receive End >> localhost password: >> ***Sending to ricci server: >> >> ***Sending End >> ***Received from ricci server >> >> >> >> ***Receive End >> Error: no ricci tag in ricci response >> >> Of course I fudged the password with XXXs for the list. > >How long does it take for you to get the "Error: no ricci tag in ricci >response"? Is it pretty quick or does it take around 30 seconds? > >> >> >> Thanks! >> >> Chip Burke >> >> >> >> >> >> >> >> On 8/16/12 12:30 PM, "Chris Feist" wrote: >> >>> On 08/15/12 16:49, Chip Burke wrote: There is nothing in messages or secure on either node1 or 2 at that time. >>> >>> Ok, there's something going on with the ricci authentication on that >>> node. Can >>> you give me the output of 'rpm -q ricci' as well as do a >>> '/etc/init.d/ricci >>> restart'. >>> >>> Then on the node that is running ricci, try this command: >>> ccs -d -h localhost --getconf >>> >>> (it should ask your for a password, and enter the ricci password) >>> >>> Thanks, >>> Chris >>> Chip Burke On 8/15/12 4:50 PM, "Chris Feist" wrote: > On 08/15/12 15:08, Chip Burke wrote: >> modcluster-0.16.2-18.el6.x86_64 >> >> And >> >> [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf >> ***Sending to ricci server: >> > version="1.0">> name="cluster">> >> >> >>name="get_cluster.conf">>ri >> cc >> i> >> ***Sending End >> ***Received from ricci server >> >> >> >> ***Receive End >> xanadunode2 password: >> ***Sending to ricci server: >> >> ***Sending End >> ***Received from ricci server >> >> >> >> ***Receive End >> Error: no ricci tag in ricci response > > Thanks, can you also provide what was in /var/log/messages and > /var/log/secure > when those errors occurred? > >> >> Thanks! >> >> >> Chip Burke >> >> >> >> >> >> >> >> On 8/15/12 3:56 PM, "Chris Feist" wrote: >> >>> On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf >>> >>> Can you send me the output of 'rpm -q modcluster' and >>> 'ccs -d -h xanadunode2 --getconf' >>> >>> Thanks, >>> Chris >>> This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: > > > Can you try using ccs to get the current configuration of that > node: > ccs -h --getconf > > > As well as use ccs to try and set the conf on that node? > ccs -f -h --setconf > > This should let us narrow down whether it's an issue with ricci >or > luci. > > Thanks! > Chris > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > >
Re: [Linux-cluster] Ricci doesn't work
On 08/16/12 13:52, Chip Burke wrote: Here we goŠ node1: ricci-0.16.2-55.el6.x86_64 node2: ricci-0.16.2-55.el6.x86_64 [root@xanadunode2 ~]# service ricci stop Shutting down ricci: [ OK ] [root@xanadunode2 ~]# service ricci start Starting ricci:[ OK ] [root@xanadunode2 ~]# ccs -d -h localhost --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End localhost password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Of course I fudged the password with XXXs for the list. How long does it take for you to get the "Error: no ricci tag in ricci response"? Is it pretty quick or does it take around 30 seconds? Thanks! Chip Burke On 8/16/12 12:30 PM, "Chris Feist" wrote: On 08/15/12 16:49, Chip Burke wrote: There is nothing in messages or secure on either node1 or 2 at that time. Ok, there's something going on with the ricci authentication on that node. Can you give me the output of 'rpm -q ricci' as well as do a '/etc/init.d/ricci restart'. Then on the node that is running ricci, try this command: ccs -d -h localhost --getconf (it should ask your for a password, and enter the ricci password) Thanks, Chris Chip Burke On 8/15/12 4:50 PM, "Chris Feist" wrote: On 08/15/12 15:08, Chip Burke wrote: modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Thanks, can you also provide what was in /var/log/messages and /var/log/secure when those errors occurred? Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf Can you send me the output of 'rpm -q modcluster' and 'ccs -d -h xanadunode2 --getconf' Thanks, Chris This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: Can you try using ccs to get the current configuration of that node: ccs -h --getconf As well as use ccs to try and set the conf on that node? ccs -f -h --setconf This should let us narrow down whether it's an issue with ricci or luci. Thanks! Chris -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Indeed. I get in via telnet between machines and get dumped out for not using SSL, but that is to be expected. Ps -A/top also show ricci happily sitting there listening. Additionally #netstat -lnptu | grep 1 tcp0 0 :::1:::* LISTEN 3210/ricci So ricci is there. Chip Burke On 8/16/12 5:10 AM, "Fabio M. Di Nitto" wrote: >Maybe a stupid question.. > >from node1: > >telnet node2 1 > >do you get anything? are the iptables set correctly? (and check also >from node2 to node1 and from the luci machine to both nodes) > >Fabio > >On 8/15/2012 11:49 PM, Chip Burke wrote: >> There is nothing in messages or secure on either node1 or 2 at that >>time. >> >> Chip Burke >> >> >> >> >> >> >> >> On 8/15/12 4:50 PM, "Chris Feist" wrote: >> >>> On 08/15/12 15:08, Chip Burke wrote: modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: >>> version="1.0" name="cluster" name="get_cluster.conf"cc i> ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response >>> >>> Thanks, can you also provide what was in /var/log/messages and >>> /var/log/secure >>> when those errors occurred? >>> Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: > On 08/14/12 20:34, Chip Burke wrote: >> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf >> >> This gives me similar results. It sits and spins for a few minutes >>and >> then fails with: >> >> Error: no ricci tag in ricci response >> >> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf > > Can you send me the output of 'rpm -q modcluster' and > 'ccs -d -h xanadunode2 --getconf' > > Thanks, > Chris > >> >> This locks up everything going to GFS2 mounts. Two nodes recovered, >> the >> other didn't, required a fence_node. GFS2 showed this before the >> fence. >> >> cd: /datastore/lvol0: Input/output error >> >> Along with the error Error: no ricci tag in ricci response >> >> >> Chip Burke >> >> >> >> >> >> >> On 8/14/12 3:12 PM, "Chris Feist" wrote: >> >>> >>> >>> Can you try using ccs to get the current configuration of that >>>node: >>> ccs -h --getconf >>> >>> >>> As well as use ccs to try and set the conf on that node? >>> ccs -f -h --setconf >>> >>> This should let us narrow down whether it's an issue with ricci or >>> luci. >>> >>> Thanks! >>> Chris >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Here we goŠ node1: ricci-0.16.2-55.el6.x86_64 node2: ricci-0.16.2-55.el6.x86_64 [root@xanadunode2 ~]# service ricci stop Shutting down ricci: [ OK ] [root@xanadunode2 ~]# service ricci start Starting ricci:[ OK ] [root@xanadunode2 ~]# ccs -d -h localhost --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End localhost password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Of course I fudged the password with XXXs for the list. Thanks! Chip Burke On 8/16/12 12:30 PM, "Chris Feist" wrote: >On 08/15/12 16:49, Chip Burke wrote: >> There is nothing in messages or secure on either node1 or 2 at that >>time. > >Ok, there's something going on with the ricci authentication on that >node. Can >you give me the output of 'rpm -q ricci' as well as do a >'/etc/init.d/ricci >restart'. > >Then on the node that is running ricci, try this command: >ccs -d -h localhost --getconf > >(it should ask your for a password, and enter the ricci password) > >Thanks, >Chris > >> >> Chip Burke >> >> >> >> >> >> >> >> On 8/15/12 4:50 PM, "Chris Feist" wrote: >> >>> On 08/15/12 15:08, Chip Burke wrote: modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: >>> version="1.0" name="cluster" name="get_cluster.conf"cc i> ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response >>> >>> Thanks, can you also provide what was in /var/log/messages and >>> /var/log/secure >>> when those errors occurred? >>> Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: > On 08/14/12 20:34, Chip Burke wrote: >> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf >> >> This gives me similar results. It sits and spins for a few minutes >>and >> then fails with: >> >> Error: no ricci tag in ricci response >> >> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf > > Can you send me the output of 'rpm -q modcluster' and > 'ccs -d -h xanadunode2 --getconf' > > Thanks, > Chris > >> >> This locks up everything going to GFS2 mounts. Two nodes recovered, >> the >> other didn't, required a fence_node. GFS2 showed this before the >> fence. >> >> cd: /datastore/lvol0: Input/output error >> >> Along with the error Error: no ricci tag in ricci response >> >> >> Chip Burke >> >> >> >> >> >> >> On 8/14/12 3:12 PM, "Chris Feist" wrote: >> >>> >>> >>> Can you try using ccs to get the current configuration of that >>>node: >>> ccs -h --getconf >>> >>> >>> As well as use ccs to try and set the conf on that node? >>> ccs -f -h --setconf >>> >>> This should let us narrow down whether it's an issue with ricci or >>> luci. >>> >>> Thanks! >>> Chris >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 08/15/12 16:49, Chip Burke wrote: There is nothing in messages or secure on either node1 or 2 at that time. Ok, there's something going on with the ricci authentication on that node. Can you give me the output of 'rpm -q ricci' as well as do a '/etc/init.d/ricci restart'. Then on the node that is running ricci, try this command: ccs -d -h localhost --getconf (it should ask your for a password, and enter the ricci password) Thanks, Chris Chip Burke On 8/15/12 4:50 PM, "Chris Feist" wrote: On 08/15/12 15:08, Chip Burke wrote: modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Thanks, can you also provide what was in /var/log/messages and /var/log/secure when those errors occurred? Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf Can you send me the output of 'rpm -q modcluster' and 'ccs -d -h xanadunode2 --getconf' Thanks, Chris This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: Can you try using ccs to get the current configuration of that node: ccs -h --getconf As well as use ccs to try and set the conf on that node? ccs -f -h --setconf This should let us narrow down whether it's an issue with ricci or luci. Thanks! Chris -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
Maybe a stupid question.. from node1: telnet node2 1 do you get anything? are the iptables set correctly? (and check also from node2 to node1 and from the luci machine to both nodes) Fabio On 8/15/2012 11:49 PM, Chip Burke wrote: > There is nothing in messages or secure on either node1 or 2 at that time. > > Chip Burke > > > > > > > > On 8/15/12 4:50 PM, "Chris Feist" wrote: > >> On 08/15/12 15:08, Chip Burke wrote: >>> modcluster-0.16.2-18.el6.x86_64 >>> >>> And >>> >>> [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf >>> ***Sending to ricci server: >>> >> version="1.0">>> name="cluster">>> >>> name="get_cluster.conf">>> i> >>> ***Sending End >>> ***Received from ricci server >>> >>> >>> >>> ***Receive End >>> xanadunode2 password: >>> ***Sending to ricci server: >>> >>> ***Sending End >>> ***Received from ricci server >>> >>> >>> >>> ***Receive End >>> Error: no ricci tag in ricci response >> >> Thanks, can you also provide what was in /var/log/messages and >> /var/log/secure >> when those errors occurred? >> >>> >>> Thanks! >>> >>> >>> Chip Burke >>> >>> >>> >>> >>> >>> >>> >>> On 8/15/12 3:56 PM, "Chris Feist" wrote: >>> On 08/14/12 20:34, Chip Burke wrote: > [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf > > This gives me similar results. It sits and spins for a few minutes and > then fails with: > > Error: no ricci tag in ricci response > > ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf Can you send me the output of 'rpm -q modcluster' and 'ccs -d -h xanadunode2 --getconf' Thanks, Chris > > This locks up everything going to GFS2 mounts. Two nodes recovered, > the > other didn't, required a fence_node. GFS2 showed this before the > fence. > > cd: /datastore/lvol0: Input/output error > > Along with the error Error: no ricci tag in ricci response > > > Chip Burke > > > > > > > On 8/14/12 3:12 PM, "Chris Feist" wrote: > >> >> >> Can you try using ccs to get the current configuration of that node: >> ccs -h --getconf >> >> >> As well as use ccs to try and set the conf on that node? >> ccs -f -h --setconf >> >> This should let us narrow down whether it's an issue with ricci or >> luci. >> >> Thanks! >> Chris >> > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
There is nothing in messages or secure on either node1 or 2 at that time. Chip Burke On 8/15/12 4:50 PM, "Chris Feist" wrote: >On 08/15/12 15:08, Chip Burke wrote: >> modcluster-0.16.2-18.el6.x86_64 >> >> And >> >> [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf >> ***Sending to ricci server: >> >version="1.0">> name="cluster">> >>name="get_cluster.conf">>i> >> ***Sending End >> ***Received from ricci server >> >> >> >> ***Receive End >> xanadunode2 password: >> ***Sending to ricci server: >> >> ***Sending End >> ***Received from ricci server >> >> >> >> ***Receive End >> Error: no ricci tag in ricci response > >Thanks, can you also provide what was in /var/log/messages and >/var/log/secure >when those errors occurred? > >> >> Thanks! >> >> >> Chip Burke >> >> >> >> >> >> >> >> On 8/15/12 3:56 PM, "Chris Feist" wrote: >> >>> On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf >>> >>> Can you send me the output of 'rpm -q modcluster' and >>> 'ccs -d -h xanadunode2 --getconf' >>> >>> Thanks, >>> Chris >>> This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: > > > Can you try using ccs to get the current configuration of that node: > ccs -h --getconf > > > As well as use ccs to try and set the conf on that node? > ccs -f -h --setconf > > This should let us narrow down whether it's an issue with ricci or > luci. > > Thanks! > Chris > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 08/15/12 15:08, Chip Burke wrote: modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Thanks, can you also provide what was in /var/log/messages and /var/log/secure when those errors occurred? Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf Can you send me the output of 'rpm -q modcluster' and 'ccs -d -h xanadunode2 --getconf' Thanks, Chris This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: Can you try using ccs to get the current configuration of that node: ccs -h --getconf As well as use ccs to try and set the conf on that node? ccs -f -h --setconf This should let us narrow down whether it's an issue with ricci or luci. Thanks! Chris -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
modcluster-0.16.2-18.el6.x86_64 And [root@xanadunode1 ~]# ccs -d -h xanadunode2 --getconf ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End xanadunode2 password: ***Sending to ricci server: ***Sending End ***Received from ricci server ***Receive End Error: no ricci tag in ricci response Thanks! Chip Burke On 8/15/12 3:56 PM, "Chris Feist" wrote: >On 08/14/12 20:34, Chip Burke wrote: >> [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf >> >> This gives me similar results. It sits and spins for a few minutes and >> then fails with: >> >> Error: no ricci tag in ricci response >> >> ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf > >Can you send me the output of 'rpm -q modcluster' and >'ccs -d -h xanadunode2 --getconf' > >Thanks, >Chris > >> >> This locks up everything going to GFS2 mounts. Two nodes recovered, the >> other didn't, required a fence_node. GFS2 showed this before the fence. >> >> cd: /datastore/lvol0: Input/output error >> >> Along with the error Error: no ricci tag in ricci response >> >> >> Chip Burke >> >> >> >> >> >> >> On 8/14/12 3:12 PM, "Chris Feist" wrote: >> >>> >>> >>> Can you try using ccs to get the current configuration of that node: >>> ccs -h --getconf >>> >>> >>> As well as use ccs to try and set the conf on that node? >>> ccs -f -h --setconf >>> >>> This should let us narrow down whether it's an issue with ricci or >>>luci. >>> >>> Thanks! >>> Chris >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > >-- >Linux-cluster mailing list >Linux-cluster@redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 08/14/12 20:34, Chip Burke wrote: [root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf Can you send me the output of 'rpm -q modcluster' and 'ccs -d -h xanadunode2 --getconf' Thanks, Chris This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: Can you try using ccs to get the current configuration of that node: ccs -h --getconf As well as use ccs to try and set the conf on that node? ccs -f -h --setconf This should let us narrow down whether it's an issue with ricci or luci. Thanks! Chris -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
[root@xanadunode1 ~]# ccs -h xanadunode2 --getconf This gives me similar results. It sits and spins for a few minutes and then fails with: Error: no ricci tag in ricci response ccs -f /etc/cluster/cluster.conf -h xanadunode2 --setconf This locks up everything going to GFS2 mounts. Two nodes recovered, the other didn't, required a fence_node. GFS2 showed this before the fence. cd: /datastore/lvol0: Input/output error Along with the error Error: no ricci tag in ricci response Chip Burke On 8/14/12 3:12 PM, "Chris Feist" wrote: > > >Can you try using ccs to get the current configuration of that node: >ccs -h --getconf > > >As well as use ccs to try and set the conf on that node? >ccs -f -h --setconf > >This should let us narrow down whether it's an issue with ricci or luci. > >Thanks! >Chris > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 08/13/12 16:38, Chip Burke wrote: Ricci is seemingly not working through either Luci nor cman_tool. There does't seem to be a lot of logging to go off of (at least I haven't found it) but what I did find in the Luci log is as follows: 15:51:06,793 ERROR [luci.lib.ricci_communicator] Error receiving header from node2.domain.local:1 Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py", line 121, in __init__ hello = self.__receive(self.__timeout_init) File "/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py", line 503, in __receive errstr = _('Error reading from %s:%d: %s') \ File "/usr/lib/python2.6/site-packages/pylons/i18n/translation.py", line 106, in ugettext return pylons.translator.ugettext(value) File "/usr/lib/python2.6/site-packages/paste/registry.py", line 137, in __getattr__ return getattr(self._current_obj(), attr) File "/usr/lib/python2.6/site-packages/paste/registry.py", line 197, in _current_obj 'thread' % self.name__) TypeError: No object (name: translator) has been registered for this thread 15:51:06,793 ERROR [luci.lib.ricci_helpers] Error receiving header from node2..local:1 15:51:06,793 ERROR [luci.lib.ricci_helpers] Error retrieving batch number from node3.X.local: Error receiving header from node3.X.local:1 Cluster config I am trying to push: Running config: Any ideas? SCP and reboots are fun and all, but I would love Ricci to work. Can you try using ccs to get the current configuration of that node: ccs -h --getconf As well as use ccs to try and set the conf on that node? ccs -f -h --setconf This should let us narrow down whether it's an issue with ricci or luci. Thanks! Chris Thanks! -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
CentOS 6.2, cman version 3.0.12.1 As for the syslog, I have nothing on other the node sending or the nodes receiving. There is no output at all. And after troubleshooting further, #cman_tool version -r -S Works if I do a manual push via scp, so it must be getting hung up on the file transfer. Running ccs_sync manually doesn't give me any different results. After a minute or two things fail with the output: The connection to node1 died unexpectedly The connection to node3 died unexpectedly The connection to node2 died unexpectedly And still nothing in the syslog. I also checked the secure log to see if authentication was bombing out, but there was nothing conclusive there either. On 8/13/12 7:58 PM, "Digimer" wrote: > >What OS and cluster versions? What is in syslog on the three nodes when >this occurs? > >-- >Digimer >Papers and Projects: https://alteeve.com -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
Re: [Linux-cluster] Ricci doesn't work
On 08/13/2012 05:38 PM, Chip Burke wrote: Ricci is seemingly not working through either Luci nor cman_tool. There does't seem to be a lot of logging to go off of (at least I haven't found it) but what I did find in the Luci log is as follows: 15:51:06,793 ERROR [luci.lib.ricci_communicator] Error receiving header from node2.domain.local:1 Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py", line 121, in __init__ hello = self.__receive(self.__timeout_init) File "/usr/lib64/python2.6/site-packages/luci/lib/ricci_communicator.py", line 503, in __receive errstr = _('Error reading from %s:%d: %s') \ File "/usr/lib/python2.6/site-packages/pylons/i18n/translation.py", line 106, in ugettext return pylons.translator.ugettext(value) File "/usr/lib/python2.6/site-packages/paste/registry.py", line 137, in __getattr__ return getattr(self._current_obj(), attr) File "/usr/lib/python2.6/site-packages/paste/registry.py", line 197, in _current_obj 'thread' % self.name__) TypeError: No object (name: translator) has been registered for this thread 15:51:06,793 ERROR [luci.lib.ricci_helpers] Error receiving header from node2..local:1 15:51:06,793 ERROR [luci.lib.ricci_helpers] Error retrieving batch number from node3.X.local: Error receiving header from node3.X.local:1 Cluster config I am trying to push: Running config: Any ideas? SCP and reboots are fun and all, but I would love Ricci to work. Thanks! What OS and cluster versions? What is in syslog on the three nodes when this occurs? -- Digimer Papers and Projects: https://alteeve.com -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster