Re: [Gluster-devel] Problem with TLA ver 887
Mickey, Can you check if the latest tla code has resolved the issues you faced? Thanks, Avati On Sun, Feb 8, 2009 at 11:19 PM, Mickey Mazarick m...@digitaltadpole.com wrote: Heh our tests are kind of an unholy mess... but here's the part I think is useful: We use a startup script that will iterate through vol files and mount the first available file on the list. We have a bunch of vol files that test a few different server configurations. After mountpoints are prepared we have other scripts that start virtual machine on the various mounts. In other words I have a directory called /glustermounts/ and in that directory I have the files: main.vol main.vol.ib main.vol.tcp stripe.vol.ha stripe.vol.tcp after running /etc/init.d/glustersystem start I will have the following mount points: /system (our default mount, we actually store the vol files here) /mnt/main /mnt/stripe The output shows me if any vol file failed to mount and it automatically attempts the next one (ex mounting main.vol failed, trying main.vol.ib). We simply arrange vol files from most features to least. We have a separate script which starts up a virtual machine on each test mount. This is the actual test we use as it creates symbolic links, uses mmaps etc but it's pretty specific to us. This closely mirrors how we use it in production. I've included out startup script and I would suggest you simply run something similar to your production on a few mounts in the same way we have. I may share this with the entire group although there are probably better init scripts out there. This one does kill all processes attached to a mount point which is useful. Let me know if you have any questions! Thanks! -Mickey Mazarick Geoff Kassel wrote: Hi, As a fellow GlusterFS user, I was just wondering if you could point me to the regression tests you're using for GlusterFS? I've looked high and low for the unit tests that the GlusterFS devs are meants to be using (ala http://www.gluster.org/docs/index.php/GlusterFS_QA) so that I can do my own testing, but I've not been able to find them. If it's tests you've developed in-house, would you be interested in releasing them to the wider community? Kind regards, Geoff Kassel. On Thu, 5 Feb 2009, Mickey Mazarick wrote: I haven't done any full regression testing to see where the problem is but the later TLA versions are causeing out storage servers to spike to 100% cpu usage and the clients never see any files. Our initial tests are with ibverbs/HA but no performance translators. Thanks! -Mickey Mazarick -- #!/bin/sh # Startup script for gluster Mount system volFiles=/glustermounts/ defaultcheckFile=customers speclist=/etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.ha /etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.tcp start() { specfile=${1} if [ $# -gt 1 ]; then mountpt=${2} else mountpt=`echo ${specfile} |sed s#\.vol.*\\\$## |sed s#/.*/##` mountpt=/mnt/${mountpt} fi logfile=`echo ${specfile} |sed s#\.vol.*\\\$## |sed s#/.*/##` logfile=/var/${logfile}.log pidfile=`echo ${specfile} |sed s#\.vol.*\\\$## |sed s#/.*/##` pidfile=/var/run/${pidfile}.pid echo mounting specfile:${specfile} at:${mountpt} with pid at:${pidfile} currentpids=`pidof glusterfs` currentpids=0 ${currentpids} mountct=`mount |grep ${mountpt} |grep -c glusterfs` if [ -f $pidfile ]; then currentpid=`cat ${pidfile}` pidct=`echo ${currentpids} |grep -c ${currentpid}` if [ ${pidct} -eq 0 ]; then rm -rf ${pidfile} echo removing pid file: ${pidfile} fi if [ ${mountct} -lt 1 ]; then echo Gluster System mount:${mountpt} died. Remounting. stop ${mountpt} ${pidfile} fi else rm -rf ${pidfile} if [ ${mountct} -gt 0 ]; then myupid=`ps -ef |grep /system |grep gluster |sed s#root\s*## |sed s#\s.*##` if [ ${myupid} -gt 0 ]; then echo ${myupid} ${pidfile} else echo Gluster System mounted at:${mountpt} but with no pid. Remounting. stop ${mountpt} ${pidfile} fi fi fi if [ -e $pidfile ]; then echo Gluster System Mount:${mountpt} is running with spec: ${specfile} #echo Gluster System Mount:${mountpt} is running. return 0 else #rm -rf /var/glustersystemclient.log modprobe fuse sleep 1.5 #rm -rf /var/glustersystemclient.log mkdir ${mountpt} rm -rf $pidfile
Re: [Gluster-devel] Problem with TLA ver 887
We did a tla install of 906 yesterday and the problem seems to have been resolved by that build. Thanks and keep up the great work! -Mic Anand Avati wrote: Mickey, Can you check if the latest tla code has resolved the issues you faced? Thanks, Avati On Sun, Feb 8, 2009 at 11:19 PM, Mickey Mazarick m...@digitaltadpole.com wrote: Heh our tests are kind of an unholy mess... but here's the part I think is useful: We use a startup script that will iterate through vol files and mount the first available file on the list. We have a bunch of vol files that test a few different server configurations. After mountpoints are prepared we have other scripts that start virtual machine on the various mounts. In other words I have a directory called "/glustermounts/" and in that directory I have the files: main.vol main.vol.ib main.vol.tcp stripe.vol.ha stripe.vol.tcp after running "/etc/init.d/glustersystem start" I will have the following mount points: /system (our default mount, we actually store the vol files here) /mnt/main /mnt/stripe The output shows me if any vol file failed to mount and it automatically attempts the next one (ex" "mounting main.vol failed, trying main.vol.ib"). We simply arrange vol files from most features to least. We have a separate script which starts up a virtual machine on each test mount. This is the actual "test" we use as it creates symbolic links, uses mmaps etc but it's pretty specific to us. This closely mirrors how we use it in production. I've included out startup script and I would suggest you simply run something similar to your production on a few mounts in the same way we have. I may share this with the entire group although there are probably better init scripts out there. This one does kill all processes attached to a mount point which is useful. Let me know if you have any questions! Thanks! -Mickey Mazarick Geoff Kassel wrote: Hi, As a fellow GlusterFS user, I was just wondering if you could point me to the regression tests you're using for GlusterFS? I've looked high and low for the unit tests that the GlusterFS devs are meants to be using (ala http://www.gluster.org/docs/index.php/GlusterFS_QA) so that I can do my own testing, but I've not been able to find them. If it's tests you've developed in-house, would you be interested in releasing them to the wider community? Kind regards, Geoff Kassel. On Thu, 5 Feb 2009, Mickey Mazarick wrote: I haven't done any full regression testing to see where the problem is but the later TLA versions are causeing out storage servers to spike to 100% cpu usage and the clients never see any files. Our initial tests are with ibverbs/HA but no performance translators. Thanks! -Mickey Mazarick -- #!/bin/sh # Startup script for gluster Mount system volFiles="/glustermounts/" defaultcheckFile="customers" speclist="/etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.ha /etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.tcp" start() { specfile=${1} if [ "$#" -gt 1 ]; then mountpt=${2} else mountpt=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed "s#/.*/##"` mountpt="/mnt/${mountpt}" fi logfile=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed "s#/.*/##"` logfile="/var/${logfile}.log" pidfile=`echo ${specfile} |sed "s#\.vol.*\\\$##" |sed "s#/.*/##"` pidfile="/var/run/${pidfile}.pid" echo "mounting specfile:${specfile} at:${mountpt} with pid at:${pidfile}" currentpids=`pidof glusterfs` currentpids="0 ${currentpids}" mountct=`mount |grep ${mountpt} |grep -c glusterfs` if [ -f $pidfile ]; then currentpid=`cat ${pidfile}` pidct=`echo "${currentpids}" |grep -c ${currentpid}` if [ "${pidct}" -eq 0 ]; then rm -rf ${pidfile} echo "removing pid file: ${pidfile}" fi if [ "${mountct}" -lt 1 ]; then echo "Gluster System mount:${mountpt} died. Remounting." stop ${mountpt} ${pidfile} fi else rm -rf ${pidfile} if [ "${mountct}" -gt 0 ]; then myupid=`ps -ef |grep /system |grep gluster |sed "s#root\s*##" |sed "s#\s.*##"` if [ "${myupid}" -gt 0 ]; then echo "${myupid}" ${pidfile} else echo "Gluster System mounted at:${mountpt} but with no pid. Remounting." stop ${mountpt} ${pidfile} fi fi fi if [ -e $pidfile ]; then echo "Gluster System Mount:${mountpt} is running with spec: ${specfile}" #echo "Gluster System Mount:${mountpt} is running." return 0 else #rm -rf /var/glustersystemclient.log
Re: [Gluster-devel] Problem with TLA ver 887
Heh our tests are kind of an unholy mess... but here's the part I think is useful: We use a startup script that will iterate through vol files and mount the first available file on the list. We have a bunch of vol files that test a few different server configurations. After mountpoints are prepared we have other scripts that start virtual machine on the various mounts. In other words I have a directory called "/glustermounts/" and in that directory I have the files: main.vol main.vol.ib main.vol.tcp stripe.vol.ha stripe.vol.tcp after running "/etc/init.d/glustersystem start" I will have the following mount points: /system (our default mount, we actually store the vol files here) /mnt/main /mnt/stripe The output shows me if any vol file failed to mount and it automatically attempts the next one (ex" "mounting main.vol failed, trying main.vol.ib"). We simply arrange vol files from most features to least. We have a separate script which starts up a virtual machine on each test mount. This is the actual "test" we use as it creates symbolic links, uses mmaps etc but it's pretty specific to us. This closely mirrors how we use it in production. I've included out startup script and I would suggest you simply run something similar to your production on a few mounts in the same way we have. I may share this with the entire group although there are probably better init scripts out there. This one does kill all processes attached to a mount point which is useful. Let me know if you have any questions! Thanks! -Mickey Mazarick Geoff Kassel wrote: Hi, As a fellow GlusterFS user, I was just wondering if you could point me to the regression tests you're using for GlusterFS? I've looked high and low for the unit tests that the GlusterFS devs are meants to be using (ala http://www.gluster.org/docs/index.php/GlusterFS_QA) so that I can do my own testing, but I've not been able to find them. If it's tests you've developed in-house, would you be interested in releasing them to the wider community? Kind regards, Geoff Kassel. On Thu, 5 Feb 2009, Mickey Mazarick wrote: I haven't done any full regression testing to see where the problem is but the later TLA versions are causeing out storage servers to spike to 100% cpu usage and the clients never see any files. Our initial tests are with ibverbs/HA but no performance translators. Thanks! -Mickey Mazarick -- #!/bin/sh # Startup script for gluster Mount system volFiles=/glustermounts/ defaultcheckFile=customers speclist=/etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.ha /etc/glusterfs-system.vol.ibverbs /etc/glusterfs-system.vol.tcp start() { specfile=${1} if [ $# -gt 1 ]; then mountpt=${2} else mountpt=`echo ${specfile} |sed s#\.vol.*\\\$## |sed s#/.*/##` mountpt=/mnt/${mountpt} fi logfile=`echo ${specfile} |sed s#\.vol.*\\\$## |sed s#/.*/##` logfile=/var/${logfile}.log pidfile=`echo ${specfile} |sed s#\.vol.*\\\$## |sed s#/.*/##` pidfile=/var/run/${pidfile}.pid echo mounting specfile:${specfile} at:${mountpt} with pid at:${pidfile} currentpids=`pidof glusterfs` currentpids=0 ${currentpids} mountct=`mount |grep ${mountpt} |grep -c glusterfs` if [ -f $pidfile ]; then currentpid=`cat ${pidfile}` pidct=`echo ${currentpids} |grep -c ${currentpid}` if [ ${pidct} -eq 0 ]; then rm -rf ${pidfile} echo removing pid file: ${pidfile} fi if [ ${mountct} -lt 1 ]; then echo Gluster System mount:${mountpt} died. Remounting. stop ${mountpt} ${pidfile} fi else rm -rf ${pidfile} if [ ${mountct} -gt 0 ]; then myupid=`ps -ef |grep /system |grep gluster |sed s#root\s*## |sed s#\s.*##` if [ ${myupid} -gt 0 ]; then echo ${myupid} ${pidfile} else echo Gluster System mounted at:${mountpt} but with no pid. Remounting. stop ${mountpt} ${pidfile} fi fi fi if [ -e $pidfile ]; then echo Gluster System Mount:${mountpt} is running with spec: ${specfile} #echo Gluster System Mount:${mountpt} is running. return 0 else #rm -rf /var/glustersystemclient.log modprobe fuse sleep 1.5 #rm -rf /var/glustersystemclient.log mkdir ${mountpt} rm -rf $pidfile cmd=/usr/local/sbin/glusterfs -p $pidfile -l ${logfile} -L ERROR -f ${specfile} --disable-direct-io-mode ${mountpt} echo ${cmd} ${cmd} #/usr/local/sbin/glusterfs -p $pidfile -l ${logfile}
Re: [Gluster-devel] Problem with TLA ver 887
Hi Mickey, Thanks for this. Cheers, Geoff Kassel. On Mon, 9 Feb 2009, Mickey Mazarick wrote: Heh our tests are kind of an unholy mess... but here's the part I think is useful: We use a startup script that will iterate through vol files and mount the first available file on the list. We have a bunch of vol files that test a few different server configurations. After mountpoints are prepared we have other scripts that start virtual machine on the various mounts. In other words I have a directory called /glustermounts/ and in that directory I have the files: main.vol main.vol.ib main.vol.tcp stripe.vol.ha stripe.vol.tcp after running /etc/init.d/glustersystem start I will have the following mount points: /system (our default mount, we actually store the vol files here) /mnt/main /mnt/stripe The output shows me if any vol file failed to mount and it automatically attempts the next one (ex mounting main.vol failed, trying main.vol.ib). We simply arrange vol files from most features to least. We have a separate script which starts up a virtual machine on each test mount. This is the actual test we use as it creates symbolic links, uses mmaps etc but it's pretty specific to us. This closely mirrors how we use it in production. I've included out startup script and I would suggest you simply run something similar to your production on a few mounts in the same way we have. I may share this with the entire group although there are probably better init scripts out there. This one does kill all processes attached to a mount point which is useful. Let me know if you have any questions! Thanks! -Mickey Mazarick Geoff Kassel wrote: Hi, As a fellow GlusterFS user, I was just wondering if you could point me to the regression tests you're using for GlusterFS? I've looked high and low for the unit tests that the GlusterFS devs are meants to be using (ala http://www.gluster.org/docs/index.php/GlusterFS_QA) so that I can do my own testing, but I've not been able to find them. If it's tests you've developed in-house, would you be interested in releasing them to the wider community? Kind regards, Geoff Kassel. On Thu, 5 Feb 2009, Mickey Mazarick wrote: I haven't done any full regression testing to see where the problem is but the later TLA versions are causeing out storage servers to spike to 100% cpu usage and the clients never see any files. Our initial tests are with ibverbs/HA but no performance translators. Thanks! -Mickey Mazarick ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problem with TLA ver 887
Hi Mickey, Can you just attach gdb to server process and see 'bt' ? Regards, Amar 2009/2/4 Mickey Mazarick m...@digitaltadpole.com I haven't done any full regression testing to see where the problem is but the later TLA versions are causeing out storage servers to spike to 100% cpu usage and the clients never see any files. Our initial tests are with ibverbs/HA but no performance translators. Thanks! -Mickey Mazarick -- ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel -- Amar Tumballi Gluster/GlusterFS Hacker [bulde on #gluster/irc.gnu.org] http://www.zresearch.com - Commoditizing Super Storage! ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problem with TLA ver 887
2009/2/5 Mickey Mazarick m...@digitaltadpole.com: I haven't done any full regression testing to see where the problem is but the later TLA versions are causeing out storage servers to spike to 100% cpu usage and the clients never see any files. Our initial tests are with ibverbs/HA but no performance translators. Thanks for reporting. I found a bug that was introduced into features/locks which would have caused a deadlock and thus 100% cpu. It has been fixed in patch-892. Can you please verify that the bug is no more? Vikas -- Engineer - Z Research http://gluster.com/ ___ Gluster-devel mailing list Gluster-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel