Re: [Gluster-users] glusterfs-3.2.x after rebalance, read failed
I test glusterfs 3.3.x, after rebalance the file distribution likes 3.2.X. The lucky is that all files can be access. >lierihanmei wrote: > >HI all. >I have got a problem. after remove-bricks and rebalanced, when i read some >file ,it gets io errors. > > >version:glusterfs 3.2.7 >OS :Centos6 > > >steps: >1, gluster volume create str stripe 2 ip:/data1 ip:/data2 ip:/data3 ip:/data4 >ip:/data5 ip:/data6 >2. gluster volume start str >3. mount -t glusterfs ip:/str /mnt/str >4. write some file into /mnt/str ( then from /mnt/str, all files are OK, can >be read and write) >5. gluster volume remove-brick str ip:/data3 ip:/data4(all files are OK too) >6, gluster rebalance str start >7, after rebalance complete(some files are OK, some can not access, get IO >ERROR) >and i test glusterfs 3.2.5, 3.2.6, it get the same error. > > >after rebalance , the data are all in ip:/data5 and ip:/data6, but in >ip:/data1 and ip:/data2only link file like "-T 1 root root 0 Mar 26 >14:07 /data/str1/file11" retains. > > >files which can not get all have link files. >the link file's attr: >[root@centos6-template dht]# getfattr -m . -e hex -d /data/str1/file11 >getfattr: Removing leading '/' from absolute path names ># file: data/str1/file11 >trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780 >trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3100 >trusted.str-stripe-0.stripe-count=0x3200 >trusted.str-stripe-0.stripe-index=0x3000 >trusted.str-stripe-0.stripe-size=0x31333130373200 >the data file's attr: >[root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file11 >getfattr: Removing leading '/' from absolute path names ># file: data/str6/file11 >trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780 >trusted.str-stripe-2.stripe-count=0x3200 >trusted.str-stripe-2.stripe-index=0x3100 >trusted.str-stripe-2.stripe-size=0x31333130373200 > > >files that can access: >[root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file6 >getfattr: Removing leading '/' from absolute path names ># file: data/str6/file6 >trusted.gfid=0xf1d6a5ec4f054926a65a1114ed3ed619 >trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3000 >trusted.str-stripe-1.stripe-count=0x3200 >trusted.str-stripe-1.stripe-index=0x3100 >trusted.str-stripe-1.stripe-size=0x31333130373200 >but from mountpoint , the attr is: >[root@centos6-template dht]# getfattr -m . -e hex -d file6 ># file: file6 >trusted.str-stripe-0.stripe-count=0x3200 >trusted.str-stripe-0.stripe-index=0x3000 >trusted.str-stripe-0.stripe-size=0x31333130373200 > > >log of glusterfs: >nsport (str-dht2-client-2) >[2013-03-26 11:56:30.957186] T [rpc-clnt.c:1224:rpc_clnt_record] >0-str-dht2-client-3: Auth Info: pid: 0, uid: 0, gid: 0, owner: 0 >[2013-03-26 11:56:30.957202] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] >0-rpc-clnt: Request fraglen 344, payload: 216, rpc hdr: 128 >[2013-03-26 11:56:30.957233] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: >submitted request (XID: 0x11x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) >to rpc-transport (str-dht2-client-3) >[2013-03-26 11:56:30.957479] T [rpc-clnt.c:638:rpc_clnt_reply_init] >0-str-dht2-client-2: received rpc message (RPC XID: 0x11x Program: GlusterFS >3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-2) >[2013-03-26 11:56:30.957536] T [rpc-clnt.c:638:rpc_clnt_reply_init] >0-str-dht2-client-3: received rpc message (RPC XID: 0x11x Program: GlusterFS >3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-3) >[2013-03-26 11:56:30.957563] D [stripe.c:2673:stripe_open_lookup_cbk] >0-str-dht2-stripe-1: /file11: stripe info need to be healed >[2013-03-26 11:56:30.957576] E [stripe.c:2691:stripe_open_lookup_cbk] >0-str-dht2-stripe-1: stripe size not set >[2013-03-26 11:56:30.957591] D [dht-common.c:2527:dht_fd_cbk] 0-str-dht2-dht: >subvolume str-dht2-stripe-1 returned -1 (Input/output error) >[2013-03-26 11:56:30.957618] W [quick-read.c:1640:qr_fstat_helper] >0-str-dht2-quick-read: open failed on path (/file11) (Input/output error), >unwinding fstat call >[2013-03-26 11:56:30.957643] W [fuse-bridge.c:516:fuse_attr_cbk] >0-glusterfs-fuse: 7: FSTAT() /file11 => -1 (Input/output error) >[2013-03-26 11:56:30.957869] T [fuse-bridge.c:2086:fuse_flush] >0-glusterfs-fuse: 8: FLUSH 0x7f1f9eb69024 >[2013-03-26 11:56:30.957956] T [fuse-bridge.c:994:fuse_err_cbk] >0-glusterfs-fuse: 8: FLUSH() ERR => 0 >[2013-03-26 11:56:30.957998] T [fuse-bridge.c:2110:fuse_release] >0-glusterfs-fuse: 9: RELEASE 0x7f1f9eb69024 >[2013-03-26 11:56:33.809728] T [rpc-clnt.c:1224:rpc_clnt_record] >0-str-dht2-client-0: Auth Info: pid: 9105, uid: 0, gid: 0, owner: 9105 >[2013-03-26 11:56:33.809759] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] >0-rpc-clnt: Request fraglen 284, payload: 156, rpc hdr: 128 >[2013-03-26 11:56:33.809794] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: >submitted request (XID: 0x12x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) >to rpc-transport (str-dht2-client-0) >[20
Re: [Gluster-users] Slow read performance
Sorry for the late reply. The call profiles look OK on the server side. I suspect it is still something to do with the client or network. Have you mounted the FUSE client with any special options? like --direct-io-mode? That can have a significant impact on read performance as read-ahead in the page-cache (which is way more efficient than gluster's read-ahead translator due to lack of context switch to serve the future page) is effectively turned off. I'm not sure if any of your networking (tcp/ip) configuration is either good or bad. Avati On Mon, Mar 11, 2013 at 9:02 AM, Thomas Wakefield wrote: > Is there a way to make a ramdisk support extended attributes? > > These are my current sysctl settings (and I have tried many different > options): > net.ipv4.ip_forward = 0 > net.ipv4.conf.default.rp_filter = 1 > net.ipv4.conf.default.accept_source_route = 0 > kernel.sysrq = 0 > kernel.core_uses_pid = 1 > net.ipv4.tcp_syncookies = 1 > kernel.msgmnb = 65536 > kernel.msgmax = 65536 > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > kernel.panic = 5 > net.core.rmem_max = 67108864 > net.core.wmem_max = 67108864 > net.ipv4.tcp_rmem = 4096 87380 67108864 > net.ipv4.tcp_wmem = 4096 65536 67108864 > net.core.netdev_max_backlog = 25 > net.ipv4.tcp_congestion_control = htcp > net.ipv4.tcp_mtu_probing = 1 > > > Here is the output from a dd write and dd read. > > [root@cpu_crew1 ~]# dd if=/dev/zero > of=/shared/working/benchmark/test.cpucrew1 bs=512k count=1 ; dd > if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=512k > 1+0 records in > 1+0 records out > 524288 bytes (5.2 GB) copied, 7.21958 seconds, 726 MB/s > 1+0 records in > 1+0 records out > 524288 bytes (5.2 GB) copied, 86.4165 seconds, 60.7 MB/s > > ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Blog post: Troubleshooting GlusterFS performance issues
All, I've been working on a new GlusterFS deployment over the last week and I wrote up some of the experiences I had while trying to get the setup ready for production: http://mjanja.co.ke/2013/03/troubleshooting-glusterfs-performance-issues/ In a nutshell, I was trying to figure out why my shiny new servers weren't giving me the speed I assumed they would out of the box (something I'm sure every new user goes through!). It was quite enlightening to run through the various components looking for bottlenecks. In the end I found that NFS was many times faster than FUSE... which seems to contradict what most people see. I haven't yet deployed my Gluster, so there's still time for more architectural changes if new information comes in! Hope it helps someone! Comments and questions appreciated. Thanks, -- Alan Orth alan.o...@gmail.com http://alaninkenya.org http://mjanja.co.ke "I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone." -Bjarne Stroustrup, inventor of C++ ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs-3.2.x after rebalance, read failed
Moving to gluster-users. lierihanmei wrote: HI all. I have got a problem. after remove-bricks and rebalanced, when i read some file ,it gets io errors. version:glusterfs 3.2.7 OS :Centos6 steps: 1, gluster volume create str stripe 2 ip:/data1 ip:/data2 ip:/data3 ip:/data4 ip:/data5 ip:/data6 2. gluster volume start str 3. mount -t glusterfs ip:/str /mnt/str 4. write some file into /mnt/str ( then from /mnt/str, all files are OK, can be read and write) 5. gluster volume remove-brick str ip:/data3 ip:/data4(all files are OK too) 6, gluster rebalance str start 7, after rebalance complete(some files are OK, some can not access, get IO ERROR) and i test glusterfs 3.2.5, 3.2.6, it get the same error. after rebalance , the data are all in ip:/data5 and ip:/data6, but in ip:/data1 and ip:/data2only link file like "-T 1 root root 0 Mar 26 14:07 /data/str1/file11" retains. files which can not get all have link files. the link file's attr: [root@centos6-template dht]# getfattr -m . -e hex -d /data/str1/file11 getfattr: Removing leading '/' from absolute path names # file: data/str1/file11 trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780 trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3100 trusted.str-stripe-0.stripe-count=0x3200 trusted.str-stripe-0.stripe-index=0x3000 trusted.str-stripe-0.stripe-size=0x31333130373200 the data file's attr: [root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file11 getfattr: Removing leading '/' from absolute path names # file: data/str6/file11 trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780 trusted.str-stripe-2.stripe-count=0x3200 trusted.str-stripe-2.stripe-index=0x3100 trusted.str-stripe-2.stripe-size=0x31333130373200 files that can access: [root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file6 getfattr: Removing leading '/' from absolute path names # file: data/str6/file6 trusted.gfid=0xf1d6a5ec4f054926a65a1114ed3ed619 trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3000 trusted.str-stripe-1.stripe-count=0x3200 trusted.str-stripe-1.stripe-index=0x3100 trusted.str-stripe-1.stripe-size=0x31333130373200 but from mountpoint , the attr is: [root@centos6-template dht]# getfattr -m . -e hex -d file6 # file: file6 trusted.str-stripe-0.stripe-count=0x3200 trusted.str-stripe-0.stripe-index=0x3000 trusted.str-stripe-0.stripe-size=0x31333130373200 log of glusterfs: nsport (str-dht2-client-2) [2013-03-26 11:56:30.957186] T [rpc-clnt.c:1224:rpc_clnt_record] 0-str-dht2-client-3: Auth Info: pid: 0, uid: 0, gid: 0, owner: 0 [2013-03-26 11:56:30.957202] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 344, payload: 216, rpc hdr: 128 [2013-03-26 11:56:30.957233] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x11x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (str-dht2-client-3) [2013-03-26 11:56:30.957479] T [rpc-clnt.c:638:rpc_clnt_reply_init] 0-str-dht2-client-2: received rpc message (RPC XID: 0x11x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-2) [2013-03-26 11:56:30.957536] T [rpc-clnt.c:638:rpc_clnt_reply_init] 0-str-dht2-client-3: received rpc message (RPC XID: 0x11x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-3) [2013-03-26 11:56:30.957563] D [stripe.c:2673:stripe_open_lookup_cbk] 0-str-dht2-stripe-1: /file11: stripe info need to be healed [2013-03-26 11:56:30.957576] E [stripe.c:2691:stripe_open_lookup_cbk] 0-str-dht2-stripe-1: stripe size not set [2013-03-26 11:56:30.957591] D [dht-common.c:2527:dht_fd_cbk] 0-str-dht2-dht: subvolume str-dht2-stripe-1 returned -1 (Input/output error) [2013-03-26 11:56:30.957618] W [quick-read.c:1640:qr_fstat_helper] 0-str-dht2-quick-read: open failed on path (/file11) (Input/output error), unwinding fstat call [2013-03-26 11:56:30.957643] W [fuse-bridge.c:516:fuse_attr_cbk] 0-glusterfs-fuse: 7: FSTAT() /file11 => -1 (Input/output error) [2013-03-26 11:56:30.957869] T [fuse-bridge.c:2086:fuse_flush] 0-glusterfs-fuse: 8: FLUSH 0x7f1f9eb69024 [2013-03-26 11:56:30.957956] T [fuse-bridge.c:994:fuse_err_cbk] 0-glusterfs-fuse: 8: FLUSH() ERR => 0 [2013-03-26 11:56:30.957998] T [fuse-bridge.c:2110:fuse_release] 0-glusterfs-fuse: 9: RELEASE 0x7f1f9eb69024 [2013-03-26 11:56:33.809728] T [rpc-clnt.c:1224:rpc_clnt_record] 0-str-dht2-client-0: Auth Info: pid: 9105, uid: 0, gid: 0, owner: 9105 [2013-03-26 11:56:33.809759] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 284, payload: 156, rpc hdr: 128 [2013-03-26 11:56:33.809794] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x12x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (str-dht2-client-0) [2013-03-26 11:56:33.809824] T [rpc-clnt.c:1224:rpc_clnt_record] 0-str-dht2-client-1: Auth Info: pid: 9105, uid: 0, gid: 0, owner: 9105 [2013-03-26 11:56:33.809840] T [rpc-clnt.c:1125:rpc_clnt_record_buil