[Gluster-users] Client and server file view, different results?! Client can't see the right file.
Hi all! Here we have another mismatch between the client view and the server mounts: From the server site everything seems well, the 20G file is visible and the attributes seem to match: 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/ # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images//20964 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20964 But the client view shows 2!! files with 0 byte size!! And these aren't any link files created by Gluster ( with the T on the end) 0 root@pserver5:~ # find /opt/profitbricks/storage/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20964 -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20964 I'm a bit stumped that we seem to have so many weird errors cropping up. Any ideas? I've checked the ext4 filesystem on all boxes, no real problems. We run a distributed cluster with 4 servers offering 2 bricks each. Best, Martin -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Monday, May 16, 2011 2:24 AM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Brick pair file mismatch, self-heal problems? Try this to trigger self heal: find gluster-mount -noleaf -print0 -name file name| xargs --null stat /dev/null On Sun, May 15, 2011 at 11:20 AM, Martin Schenker martin.schen...@profitbricks.com wrote: Can someone enlighten me what's going on here? We have a two peers, the file 21313 is shown through the client mountpoint as 1Jan1970, attribs on server pserver3 don't match but NO self-heal or repair can be triggered through ls -alR?!? Checking the files through the server mounts show that two versions are on the system. But the wrong one (as with the 1Jan1970) seems to be the preferred one by the client?!? Do I need to use setattr or what in order to get the client to see the RIGHT version?!? This is not the ONLY file displaying this problematic behaviour! Thanks for any feedback. Martin pserver5: 0 root@pserver5:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu vcb 483183820800 May 13 13:41 21313 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f -8542864da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d a6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- 8542864da6ef/h dd-images/21313 pserver3: 0 root@pserver3:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 21313 0 root@pserver3:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d a6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- 8542864da6ef/h dd-images/21313 0 root@pserver3:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18- ad8f-8542864da6ef/hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864 da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x0b090900 - mismatch, should be targeted for self-heal/repair? Why is there a difference in the views? From the volfile: volume storage0-client-2 type protocol/client option remote-host de-dc1-c1-pserver3 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-3
Re: [Gluster-users] Géo-rep fail
Hi - Do you have passwordless ssh login to slave machine? After setting passwordless login ,please try this - #gluster volume geo-replication athena root@$(hostname):/soft/venus start or #gluster volume geo-replication athena $(hostname):/soft/venus start wait for few seconds then verify the status. For minimum requirement ,checkout this http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Checking_Geo-replication_Minimum_Requirements HTH -- Cheers, Lakshmipathi.G FOSS Programmer. - Original Message - From: anthony garnier sokar6...@hotmail.com To: gluster-users@gluster.org Sent: Monday, May 16, 2011 5:06:22 PM Subject: [Gluster-users] Géo-rep fail Hi, I'm currently trying to use géo-rep on the local data-node into a directory but it fails with status faulty Volume : Volume Name: athena Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: ylal3020:/users/exp1 Brick2: yval1010:/users/exp3 Brick3: ylal3030:/users/exp2 Brick4: yval1000:/users/exp4 Options Reconfigured: geo-replication.indexing: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.cache-max-file-size: 256MB network.ping-timeout: 5 performance.cache-size: 512MB performance.cache-refresh-timeout: 60 nfs.port: 2049 I've done this cmd : # gluster volume geo-replication athena /soft/venus config # gluster volume geo-replication athena /soft/venus start # gluster volume geo-replication athena /soft/venus status MASTER SLAVE STATUS athena /soft/venus faulty Here is the log file in Debug mod : [2011-05-16 13:28:55.268006] I [monitor(monitor):42:monitor] Monitor: [2011-05-16 13:28:55.268281] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [2011-05-16 13:28:55.326309] I [gsyncd:287:main_i] top: syncing: gluster://localhost:athena - file:///soft/venus [2011-05-16 13:28:55.327905] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545335.33 __repce_version__() ... [2011-05-16 13:28:55.462613] D [repce:141:__call__] RepceClient: call 10888:47702589471600:1305545335.33 __repce_version__ - 1.0 [2011-05-16 13:28:55.462886] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545335.46 version() ... [2011-05-16 13:28:55.463330] D [repce:141:__call__] RepceClient: call 10888:47702589471600:1305545335.46 version - 1.0 [2011-05-16 13:28:55.480202] D [resource:381:connect] GLUSTER: auxiliary glusterfs mount in place [2011-05-16 13:28:55.682863] D [resource:393:connect] GLUSTER: auxiliary glusterfs mount prepared [2011-05-16 13:28:55.684926] D [monitor(monitor):57:monitor] Monitor: worker got connected in 0 sec, waiting 59 more to make sure it's fine [2011-05-16 13:28:55.685096] D [repce:131:push] RepceClient: call 10888:1115703616:1305545335.68 keep_alive(None,) ... [2011-05-16 13:28:55.685859] D [repce:141:__call__] RepceClient: call 10888:1115703616:1305545335.68 keep_alive - 1 [2011-05-16 13:28:59.546574] D [master:167:volinfo_state_machine] top: (None, None) (None, 28521f8f) - (None, 28521f8f) [2011-05-16 13:28:59.546863] I [master:184:crawl] GMaster: new master is 28521f8f-49d3-4e2a-b984-f664f44f5289 [2011-05-16 13:28:59.547034] I [master:191:crawl] GMaster: primary master with volume id 28521f8f-49d3-4e2a-b984-f664f44f5289 ... [2011-05-16 13:28:59.547180] D [master:199:crawl] GMaster: entering . [2011-05-16 13:28:59.548289] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545339.55 xtime('.', '28521f8f-49d3-4e2a-b984-f664f44f5289') ... [2011-05-16 13:28:59.596978] E [syncdutils:131:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 152, in twrap tf(*aa) File /usr/local/libexec/glusterfs/python/syncdaemon/repce.py, line 118, in listen rid, exc, res = recv(self.inf) File /usr/local/libexec/glusterfs/python/syncdaemon/repce.py, line 42, in recv return pickle.load(inf) EOFError Does anyone already got those errors ? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Géo-rep fail
Hi, Yes my machine got passwordless ssh login but it still doesn't work. I also got the requirement with the good version of software. Date: Mon, 16 May 2011 07:02:57 -0500 From: lakshmipa...@gluster.com To: sokar6...@hotmail.com CC: gluster-users@gluster.org Subject: Re: [Gluster-users] Géo-rep fail Hi - Do you have passwordless ssh login to slave machine? After setting passwordless login ,please try this - #gluster volume geo-replication athena root@$(hostname):/soft/venus start or #gluster volume geo-replication athena $(hostname):/soft/venus start wait for few seconds then verify the status. For minimum requirement ,checkout this http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Checking_Geo-replication_Minimum_Requirements HTH -- Cheers, Lakshmipathi.G FOSS Programmer. - Original Message - From: anthony garnier sokar6...@hotmail.com To: gluster-users@gluster.org Sent: Monday, May 16, 2011 5:06:22 PM Subject: [Gluster-users] Géo-rep fail Hi, I'm currently trying to use géo-rep on the local data-node into a directory but it fails with status faulty Volume : Volume Name: athena Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: ylal3020:/users/exp1 Brick2: yval1010:/users/exp3 Brick3: ylal3030:/users/exp2 Brick4: yval1000:/users/exp4 Options Reconfigured: geo-replication.indexing: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.cache-max-file-size: 256MB network.ping-timeout: 5 performance.cache-size: 512MB performance.cache-refresh-timeout: 60 nfs.port: 2049 I've done this cmd : # gluster volume geo-replication athena /soft/venus config # gluster volume geo-replication athena /soft/venus start # gluster volume geo-replication athena /soft/venus status MASTER SLAVE STATUS athena /soft/venus faulty Here is the log file in Debug mod : [2011-05-16 13:28:55.268006] I [monitor(monitor):42:monitor] Monitor: [2011-05-16 13:28:55.268281] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [2011-05-16 13:28:55.326309] I [gsyncd:287:main_i] top: syncing: gluster://localhost:athena - file:///soft/venus [2011-05-16 13:28:55.327905] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545335.33 __repce_version__() ... [2011-05-16 13:28:55.462613] D [repce:141:__call__] RepceClient: call 10888:47702589471600:1305545335.33 __repce_version__ - 1.0 [2011-05-16 13:28:55.462886] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545335.46 version() ... [2011-05-16 13:28:55.463330] D [repce:141:__call__] RepceClient: call 10888:47702589471600:1305545335.46 version - 1.0 [2011-05-16 13:28:55.480202] D [resource:381:connect] GLUSTER: auxiliary glusterfs mount in place [2011-05-16 13:28:55.682863] D [resource:393:connect] GLUSTER: auxiliary glusterfs mount prepared [2011-05-16 13:28:55.684926] D [monitor(monitor):57:monitor] Monitor: worker got connected in 0 sec, waiting 59 more to make sure it's fine [2011-05-16 13:28:55.685096] D [repce:131:push] RepceClient: call 10888:1115703616:1305545335.68 keep_alive(None,) ... [2011-05-16 13:28:55.685859] D [repce:141:__call__] RepceClient: call 10888:1115703616:1305545335.68 keep_alive - 1 [2011-05-16 13:28:59.546574] D [master:167:volinfo_state_machine] top: (None, None) (None, 28521f8f) - (None, 28521f8f) [2011-05-16 13:28:59.546863] I [master:184:crawl] GMaster: new master is 28521f8f-49d3-4e2a-b984-f664f44f5289 [2011-05-16 13:28:59.547034] I [master:191:crawl] GMaster: primary master with volume id 28521f8f-49d3-4e2a-b984-f664f44f5289 ... [2011-05-16 13:28:59.547180] D [master:199:crawl] GMaster: entering . [2011-05-16 13:28:59.548289] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545339.55 xtime('.', '28521f8f-49d3-4e2a-b984-f664f44f5289') ... [2011-05-16 13:28:59.596978] E [syncdutils:131:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 152, in twrap tf(*aa) File /usr/local/libexec/glusterfs/python/syncdaemon/repce.py, line 118, in listen rid, exc, res = recv(self.inf) File /usr/local/libexec/glusterfs/python/syncdaemon/repce.py, line 42, in recv return pickle.load(inf) EOFError Does anyone already got those errors ? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
Re: [Gluster-users] Géo-rep fail
On 05/16/11 17:06, anthony garnier wrote: Hi, I'm currently trying to use géo-rep on the local data-node into a directory but it fails with status faulty [...] I've done this cmd : # gluster volume geo-replication athena /soft/venus config # gluster volume geo-replication athena /soft/venus start # gluster volume geo-replication athena /soft/venus status MASTER SLAVE STATUS athena /soft/venus faulty Here is the log file in Debug mod : [2011-05-16 13:28:55.268006] I [monitor(monitor):42:monitor] Monitor: [2011-05-16 13:28:55.268281] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [...] [2011-05-16 13:28:59.547034] I [master:191:crawl] GMaster: primary master with volume id 28521f8f-49d3-4e2a-b984-f664f44f5289 ... [2011-05-16 13:28:59.547180] D [master:199:crawl] GMaster: entering . [2011-05-16 13:28:59.548289] D [repce:131:push] RepceClient: call 10888:47702589471600:1305545339.55 xtime('.', '28521f8f-49d3-4e2a-b984-f664f44f5289') ... [2011-05-16 13:28:59.596978] E [syncdutils:131:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 152, in twrap tf(*aa) File /usr/local/libexec/glusterfs/python/syncdaemon/repce.py, line 118, in listen rid, exc, res = recv(self.inf) File /usr/local/libexec/glusterfs/python/syncdaemon/repce.py, line 42, in recv return pickle.load(inf) EOFError Does anyone already got those errors ? This means slave gsyncd instance could not properly start up. To debug this further, we need to see the slave side logs. In your case, the following commands will set a debug log level for the slave (takes effect if done before starting the geo-replication session) and locate its log file: # gluster volume geo-replication /soft/venus config log-level DEBUG # gluster volume geo-replication /soft/venus config log-file The output of the latter will contain an unresolved parameter ${session-owner}. To get its actual value, run # gluster volume geo-replication athena /soft/venus config session-owner -- please post the content of the actual log file, path to which you get after the substitution. (Also, cf. http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Configuring_Geo-replication , slave-side logs are illustrated there.) Csaba ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Géo-rep fail
Lakshmi, On 05/16/11 17:32, Lakshmipathi.G wrote: Hi - Do you have passwordless ssh login to slave machine? After setting passwordless login ,please try this - #gluster volume geo-replication athena root@$(hostname):/soft/venus start or #gluster volume geo-replication athena $(hostname):/soft/venus start Throwing ssh into the pot is quite unnecessary, and, with respect to debugging Anthony's issue, plain confusing. He's trying to set up a simple local file slave which is the simplest possible setup. So let's first see what's the prob with that, as such. An ssh slave is conceptually different from a direct file slave, OK folks? If your food is rotten, wrapping it into a pancake won't make it better. So just leave ssh alone 'till the user her/himself is concerned about it. Csaba ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster 3.2 optimization options?
Hey, are there any optimization options for 3.2 like there were for 3.1 versions? I specifically need to have more client connections and server connections. Without these 3.1 would lock up all the time as I was using it for web image storage. After allowing more than the stock number of connections it ran like a champ. I have been having issues with 3.2 and I believe it is because the default configs are not optimized. Please let me know! Justice London ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Client and server file view, different results?! Client can't see the right file.
What happens when you read the file? Do you see right contents? These look like linked files created in order to locate the files in the right server. Did you recently upgrade or add/remove bricks? Can you also look at the gfid on these files from server side? Run getfattr -dm - file name On Mon, May 16, 2011 at 2:19 AM, Martin Schenker martin.schen...@profitbricks.com wrote: Hi all! Here we have another mismatch between the client view and the server mounts: From the server site everything seems well, the 20G file is visible and the attributes seem to match: 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/ # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images//20964 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20964 But the client view shows 2!! files with 0 byte size!! And these aren't any link files created by Gluster… ( with the T on the end) 0 root@pserver5:~ # find /opt/profitbricks/storage/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20964 -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20964 I'm a bit stumped that we seem to have so many weird errors cropping up. Any ideas? I've checked the ext4 filesystem on all boxes, no real problems. We run a distributed cluster with 4 servers offering 2 bricks each. Best, Martin -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Monday, May 16, 2011 2:24 AM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Brick pair file mismatch, self-heal problems? Try this to trigger self heal: find gluster-mount -noleaf -print0 -name file name| xargs --null stat /dev/null On Sun, May 15, 2011 at 11:20 AM, Martin Schenker martin.schen...@profitbricks.com wrote: Can someone enlighten me what's going on here? We have a two peers, the file 21313 is shown through the client mountpoint as 1Jan1970, attribs on server pserver3 don't match but NO self-heal or repair can be triggered through ls -alR?!? Checking the files through the server mounts show that two versions are on the system. But the wrong one (as with the 1Jan1970) seems to be the preferred one by the client?!? Do I need to use setattr or what in order to get the client to see the RIGHT version?!? This is not the ONLY file displaying this problematic behaviour! Thanks for any feedback. Martin pserver5: 0 root@pserver5:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu vcb 483183820800 May 13 13:41 21313 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f -8542864da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d a6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- 8542864da6ef/h dd-images/21313 pserver3: 0 root@pserver3:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 21313 0 root@pserver3:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d a6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- 8542864da6ef/h dd-images/21313 0 root@pserver3:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18- ad8f-8542864da6ef/hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864 da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x0b090900 - mismatch,
Re: [Gluster-users] Gluster 3.2 optimization options?
As far as I know those options should be same between 3.1 and 3.2 On Mon, May 16, 2011 at 9:11 AM, Justice London jlon...@lawinfo.com wrote: Hey, are there any optimization options for 3.2 like there were for 3.1 versions? I specifically need to have more client connections and server connections. Without these 3.1 would lock up all the time as I was using it for web image storage. After allowing more than the stock number of connections it ran like a champ. I have been having issues with 3.2 and I believe it is because the default configs are not optimized. Please let me know! Justice London ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Rebuild Distributed/Replicated Setup
Hi, I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup across two servers (web01 and web02) with the following vol config: volume shared-application-data-client-0 type protocol/client option remote-host web01 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-client-1 type protocol/client option remote-host web02 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-replicate-0 type cluster/replicate subvolumes shared-application-data-client-0 shared-application-data-client-1 end-volume volume shared-application-data-write-behind type performance/write-behind subvolumes shared-application-data-replicate-0 end-volume volume shared-application-data-read-ahead type performance/read-ahead subvolumes shared-application-data-write-behind end-volume volume shared-application-data-io-cache type performance/io-cache subvolumes shared-application-data-read-ahead end-volume volume shared-application-data-quick-read type performance/quick-read subvolumes shared-application-data-io-cache end-volume volume shared-application-data-stat-prefetch type performance/stat-prefetch subvolumes shared-application-data-quick-read end-volume volume shared-application-data type debug/io-stats subvolumes shared-application-data-stat-prefetch end-volume In total, four servers mount this via GlusterFS FUSE. For whatever reason (I'm really not sure why), the GlusterFS filesystem has run into a bit of split-brain nightmare (although to my knowledge an actual split brain situation has never occurred in this environment), and I have been getting solidly corrupted issues across the filesystem as well as complaints that the filesystem cannot be self-healed. What I would like to do is completely empty one of the two servers (here I am trying to empty server web01), making the other one (in this case web02) the authoritative source for the data; and then have web01 completely rebuild it's mirror directly from web02. What's the easiest/safest way to do this? Is there a command that I can run that will force web01 to re-initialize it's mirror directly from web02 (and thus completely eradicate all of the split-brain errors and data inconsistencies)? Thanks! -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | r...@goclio.com www.goclio.com | blog http://www.goclio.com/blog | twitterhttp://www.twitter.com/goclio | facebook http://www.facebook.com/goclio ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] add brick fails
Hi, I am trying to use tried to glusterfs_3.2.0 on ubuntu natty(11.04). I have 3 servers, and 2 servers are already added to peer by following command. root@natty3:~# gluster peer probe natty3 root@natty3:~# gluster peer probe natty4 also, I created volume. root@natty3:~# gluster volume create test-volume transport tcp natty3:/opt/gluster/distributed natty3:/opt/gluster/distributed root@natty3:~# gluster volume info Volume Name: test-volume Type: Distribute Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: natty4:/opt/gluster/distributed Brick2: natty3:/opt/gluster/distributed but when I tried to add brick, it failed. root@natty3:~# gluster peer probe natty2 root@natty3:~# gluster volume add-brick test-volume natty2:/opt/gluster/distributed Another operation is in progress, please retry after some time /var/log/glusterfs/etc-glusterfs-glusterd.vol.log said [2011-05-13 09:42:29.77565] E [glusterd-handler.c:1288:glusterd_handle_add_brick] 0-: Unable to set cli op: 16 [2011-05-13 09:42:29.82016] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1020) /var/log/glusterfs/cli.log said [2011-05-13 09:43:28.135429] W [rpc-transport.c:604:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2011-05-13 09:43:28.221217] I [cli-rpc-ops.c:1010:gf_cli3_1_add_brick_cbk] 0-cli: Received resp to add brick [2011-05-13 09:43:28.221348] I [input.c:46:cli_batch] 0-: Exiting with: -1 Mounting volume is OK because if I tried to mount test-volume from natty3 natty4, and created some files, then I can see them from both hosts. In addition, probing natty2(new server) is OK because gluster peer status lists natty2. I have no idea how to get over this. Any helps are appreciated. mkey ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] add brick fails
Not sure if this is related but do you know why you are seeing (127.0.0.1:1020) ? Can you look at gluster peer status on all the hosts and see if they can see each other? On Mon, May 16, 2011 at 11:17 AM, mkey m...@inter7.jp wrote: Hi, I am trying to use tried to glusterfs_3.2.0 on ubuntu natty(11.04). I have 3 servers, and 2 servers are already added to peer by following command. root@natty3:~# gluster peer probe natty3 root@natty3:~# gluster peer probe natty4 also, I created volume. root@natty3:~# gluster volume create test-volume transport tcp natty3:/opt/gluster/distributed natty3:/opt/gluster/distributed root@natty3:~# gluster volume info Volume Name: test-volume Type: Distribute Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: natty4:/opt/gluster/distributed Brick2: natty3:/opt/gluster/distributed but when I tried to add brick, it failed. root@natty3:~# gluster peer probe natty2 root@natty3:~# gluster volume add-brick test-volume natty2:/opt/gluster/distributed Another operation is in progress, please retry after some time /var/log/glusterfs/etc-glusterfs-glusterd.vol.log said [2011-05-13 09:42:29.77565] E [glusterd-handler.c:1288:glusterd_handle_add_brick] 0-: Unable to set cli op: 16 [2011-05-13 09:42:29.82016] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1020) /var/log/glusterfs/cli.log said [2011-05-13 09:43:28.135429] W [rpc-transport.c:604:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2011-05-13 09:43:28.221217] I [cli-rpc-ops.c:1010:gf_cli3_1_add_brick_cbk] 0-cli: Received resp to add brick [2011-05-13 09:43:28.221348] I [input.c:46:cli_batch] 0-: Exiting with: -1 Mounting volume is OK because if I tried to mount test-volume from natty3 natty4, and created some files, then I can see them from both hosts. In addition, probing natty2(new server) is OK because gluster peer status lists natty2. I have no idea how to get over this. Any helps are appreciated. mkey ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebuild Distributed/Replicated Setup
hi Remi, Would it be possible to post the logs on the client, so that we can find what issue you are running into. Pranith - Original Message - From: Remi Broemeling r...@goclio.com To: gluster-users@gluster.org Sent: Monday, May 16, 2011 10:47:33 PM Subject: [Gluster-users] Rebuild Distributed/Replicated Setup Hi, I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup across two servers (web01 and web02) with the following vol config: volume shared-application-data-client-0 type protocol/client option remote-host web01 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-client-1 type protocol/client option remote-host web02 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-replicate-0 type cluster/replicate subvolumes shared-application-data-client-0 shared-application-data-client-1 end-volume volume shared-application-data-write-behind type performance/write-behind subvolumes shared-application-data-replicate-0 end-volume volume shared-application-data-read-ahead type performance/read-ahead subvolumes shared-application-data-write-behind end-volume volume shared-application-data-io-cache type performance/io-cache subvolumes shared-application-data-read-ahead end-volume volume shared-application-data-quick-read type performance/quick-read subvolumes shared-application-data-io-cache end-volume volume shared-application-data-stat-prefetch type performance/stat-prefetch subvolumes shared-application-data-quick-read end-volume volume shared-application-data type debug/io-stats subvolumes shared-application-data-stat-prefetch end-volume In total, four servers mount this via GlusterFS FUSE. For whatever reason (I'm really not sure why), the GlusterFS filesystem has run into a bit of split-brain nightmare (although to my knowledge an actual split brain situation has never occurred in this environment), and I have been getting solidly corrupted issues across the filesystem as well as complaints that the filesystem cannot be self-healed. What I would like to do is completely empty one of the two servers (here I am trying to empty server web01), making the other one (in this case web02) the authoritative source for the data; and then have web01 completely rebuild it's mirror directly from web02. What's the easiest/safest way to do this? Is there a command that I can run that will force web01 to re-initialize it's mirror directly from web02 (and thus completely eradicate all of the split-brain errors and data inconsistencies)? Thanks! -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | r...@goclio.com www.goclio.com | blog | twitter | facebook ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Client and server file view, different results?! Client can't see the right file.
Martin, Is this a distributed-replicate setup?. Could you attach the vol-file of the client. Pranith - Original Message - From: Martin Schenker martin.schen...@profitbricks.com To: gluster-users@gluster.org Sent: Monday, May 16, 2011 2:49:29 PM Subject: [Gluster-users] Client and server file view, different results?! Client can't see the right file. Client and server file view, different results?! Client can't see the right file. Hi all! Here we have another mismatch between the client view and the server mounts: From the server site everything seems well, the 20G file is visible and the attributes seem to match: 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/ # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images//20964 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20964 But the client view shows 2!! files with 0 byte size!! And these aren't any link files created by Gluster… ( with the T on the end) 0 root@pserver5:~ # find /opt/profitbricks/storage/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20964 -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20964 I'm a bit stumped that we seem to have so many weird errors cropping up. Any ideas? I've checked the ext4 filesystem on all boxes, no real problems. We run a distributed cluster with 4 servers offering 2 bricks each. Best, Martin -Original Message- From: Mohit Anchlia [ mailto:mohitanch...@gmail.com ] Sent: Monday, May 16, 2011 2:24 AM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Brick pair file mismatch, self-heal problems? Try this to trigger self heal: find gluster-mount -noleaf -print0 -name file name| xargs --null stat /dev/null On Sun, May 15, 2011 at 11:20 AM, Martin Schenker martin.schen...@profitbricks.com wrote: Can someone enlighten me what's going on here? We have a two peers, the file 21313 is shown through the client mountpoint as 1Jan1970, attribs on server pserver3 don't match but NO self-heal or repair can be triggered through ls -alR?!? Checking the files through the server mounts show that two versions are on the system. But the wrong one (as with the 1Jan1970) seems to be the preferred one by the client?!? Do I need to use setattr or what in order to get the client to see the RIGHT version?!? This is not the ONLY file displaying this problematic behaviour! Thanks for any feedback. Martin pserver5: 0 root@pserver5:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu vcb 483183820800 May 13 13:41 21313 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f -8542864da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d a6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- 8542864da6ef/h dd-images/21313 pserver3: 0 root@pserver3:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 4da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 21313 0 root@pserver3:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d a6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- 8542864da6ef/h dd-images/21313 0 root@pserver3:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18- ad8f-8542864da6ef/hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864 da6ef/ hdd-images/21313
Re: [Gluster-users] Client and server file view, different results?! Client can't see the right file.
Yes, it is! Here's the volfile: cat /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol: volume storage0-client-0 type protocol/client option remote-host de-dc1-c1-pserver3 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-1 type protocol/client option remote-host de-dc1-c1-pserver5 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-2 type protocol/client option remote-host de-dc1-c1-pserver3 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-3 type protocol/client option remote-host de-dc1-c1-pserver5 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-4 type protocol/client option remote-host de-dc1-c1-pserver12 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-5 type protocol/client option remote-host de-dc1-c1-pserver13 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-6 type protocol/client option remote-host de-dc1-c1-pserver12 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-7 type protocol/client option remote-host de-dc1-c1-pserver13 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-replicate-0 type cluster/replicate subvolumes storage0-client-0 storage0-client-1 end-volume volume storage0-replicate-1 type cluster/replicate subvolumes storage0-client-2 storage0-client-3 end-volume volume storage0-replicate-2 type cluster/replicate subvolumes storage0-client-4 storage0-client-5 end-volume volume storage0-replicate-3 type cluster/replicate subvolumes storage0-client-6 storage0-client-7 end-volume volume storage0-dht type cluster/distribute subvolumes storage0-replicate-0 storage0-replicate-1 storage0-replicate-2 storage0-replicate-3 end-volume volume storage0-write-behind type performance/write-behind subvolumes storage0-dht end-volume volume storage0-read-ahead type performance/read-ahead subvolumes storage0-write-behind end-volume volume storage0-io-cache type performance/io-cache option cache-size 4096MB subvolumes storage0-read-ahead end-volume volume storage0-quick-read type performance/quick-read option cache-size 4096MB subvolumes storage0-io-cache end-volume volume storage0-stat-prefetch type performance/stat-prefetch subvolumes storage0-quick-read end-volume volume storage0 type debug/io-stats subvolumes storage0-stat-prefetch end-volume -Original Message- From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] Sent: Tuesday, May 17, 2011 7:16 AM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Client and server file view, different results?! Client can't see the right file. Martin, Is this a distributed-replicate setup?. Could you attach the vol-file of the client. Pranith - Original Message - From: Martin Schenker martin.schen...@profitbricks.com To: gluster-users@gluster.org Sent: Monday, May 16, 2011 2:49:29 PM Subject: [Gluster-users] Client and server file view, different results?! Client can't see the right file. Client and server file view, different results?! Client can't see the right file. Hi all! Here we have another mismatch between the client view and the server mounts: From the server site everything seems well, the 20G file is visible and the attributes seem to match: 0 root@pserver5:~ # getfattr -R -d -e hex -m trusted.afr. /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8 f-8542864da6ef/hdd-images/ # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f -8542864da6ef/hdd-images//20964 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8 f-8542864da6ef/hdd-images/20964 But the client view shows 2!! files with 0 byte size!! And these aren't any link files created by Gluster. ( with the T on the end) 0 root@pserver5:~ # find /opt/profitbricks/storage/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1