Re: [Gluster-users] glusterfs-3.2.x after rebalance, read failed

2013-03-26 Thread John Mark Walker
Moving to gluster-users.

lierihanmei lierihan...@163.com wrote:

HI all.
I have got a problem. after remove-bricks and rebalanced, when i read some file 
,it gets io errors.


version:glusterfs 3.2.7
OS  :Centos6


steps:
1, gluster volume create str stripe 2 ip:/data1 ip:/data2 ip:/data3 ip:/data4 
ip:/data5 ip:/data6
2. gluster volume start str
3. mount -t glusterfs ip:/str /mnt/str
4. write some file into /mnt/str ( then from /mnt/str,  all files are OK, can 
be read and write)
5. gluster volume remove-brick str ip:/data3 ip:/data4(all files are OK too)
6, gluster rebalance str start
7, after rebalance complete(some files are OK, some can not access, get IO 
ERROR)
and i test glusterfs 3.2.5, 3.2.6, it get the same error.


after rebalance , the data are all in ip:/data5 and ip:/data6,   but in 
ip:/data1 and ip:/data2only link file like -T 1 root root 0 Mar 26 
14:07 /data/str1/file11  retains.


files which can not get all have link files.
the link file's attr:
[root@centos6-template dht]# getfattr -m . -e hex -d /data/str1/file11
getfattr: Removing leading '/' from absolute path names
# file: data/str1/file11
trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780
trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3100
trusted.str-stripe-0.stripe-count=0x3200
trusted.str-stripe-0.stripe-index=0x3000
trusted.str-stripe-0.stripe-size=0x31333130373200
the data file's attr:
[root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file11
getfattr: Removing leading '/' from absolute path names
# file: data/str6/file11
trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780
trusted.str-stripe-2.stripe-count=0x3200
trusted.str-stripe-2.stripe-index=0x3100
trusted.str-stripe-2.stripe-size=0x31333130373200


files that can access:
[root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file6
getfattr: Removing leading '/' from absolute path names
# file: data/str6/file6
trusted.gfid=0xf1d6a5ec4f054926a65a1114ed3ed619
trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3000
trusted.str-stripe-1.stripe-count=0x3200
trusted.str-stripe-1.stripe-index=0x3100
trusted.str-stripe-1.stripe-size=0x31333130373200
but from mountpoint , the attr is:
[root@centos6-template dht]# getfattr -m . -e hex -d file6
# file: file6
trusted.str-stripe-0.stripe-count=0x3200
trusted.str-stripe-0.stripe-index=0x3000
trusted.str-stripe-0.stripe-size=0x31333130373200


log of glusterfs:
nsport (str-dht2-client-2)
[2013-03-26 11:56:30.957186] T [rpc-clnt.c:1224:rpc_clnt_record] 
0-str-dht2-client-3: Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2013-03-26 11:56:30.957202] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] 
0-rpc-clnt: Request fraglen 344, payload: 216, rpc hdr: 128
[2013-03-26 11:56:30.957233] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: 
submitted request (XID: 0x11x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) 
to rpc-transport (str-dht2-client-3)
[2013-03-26 11:56:30.957479] T [rpc-clnt.c:638:rpc_clnt_reply_init] 
0-str-dht2-client-2: received rpc message (RPC XID: 0x11x Program: GlusterFS 
3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-2)
[2013-03-26 11:56:30.957536] T [rpc-clnt.c:638:rpc_clnt_reply_init] 
0-str-dht2-client-3: received rpc message (RPC XID: 0x11x Program: GlusterFS 
3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-3)
[2013-03-26 11:56:30.957563] D [stripe.c:2673:stripe_open_lookup_cbk] 
0-str-dht2-stripe-1: /file11: stripe info need to be healed
[2013-03-26 11:56:30.957576] E [stripe.c:2691:stripe_open_lookup_cbk] 
0-str-dht2-stripe-1: stripe size not set
[2013-03-26 11:56:30.957591] D [dht-common.c:2527:dht_fd_cbk] 0-str-dht2-dht: 
subvolume str-dht2-stripe-1 returned -1 (Input/output error)
[2013-03-26 11:56:30.957618] W [quick-read.c:1640:qr_fstat_helper] 
0-str-dht2-quick-read: open failed on path (/file11) (Input/output error), 
unwinding fstat call
[2013-03-26 11:56:30.957643] W [fuse-bridge.c:516:fuse_attr_cbk] 
0-glusterfs-fuse: 7: FSTAT() /file11 = -1 (Input/output error)
[2013-03-26 11:56:30.957869] T [fuse-bridge.c:2086:fuse_flush] 
0-glusterfs-fuse: 8: FLUSH 0x7f1f9eb69024
[2013-03-26 11:56:30.957956] T [fuse-bridge.c:994:fuse_err_cbk] 
0-glusterfs-fuse: 8: FLUSH() ERR = 0
[2013-03-26 11:56:30.957998] T [fuse-bridge.c:2110:fuse_release] 
0-glusterfs-fuse: 9: RELEASE 0x7f1f9eb69024
[2013-03-26 11:56:33.809728] T [rpc-clnt.c:1224:rpc_clnt_record] 
0-str-dht2-client-0: Auth Info: pid: 9105, uid: 0, gid: 0, owner: 9105
[2013-03-26 11:56:33.809759] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] 
0-rpc-clnt: Request fraglen 284, payload: 156, rpc hdr: 128
[2013-03-26 11:56:33.809794] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: 
submitted request (XID: 0x12x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) 
to rpc-transport (str-dht2-client-0)
[2013-03-26 11:56:33.809824] T [rpc-clnt.c:1224:rpc_clnt_record] 
0-str-dht2-client-1: Auth Info: pid: 9105, uid: 0, gid: 0, owner: 9105
[2013-03-26 11:56:33.809840] T 

[Gluster-users] Blog post: Troubleshooting GlusterFS performance issues

2013-03-26 Thread Alan Orth

All,

I've been working on a new GlusterFS deployment over the last week and I 
wrote up some of the experiences I had while trying to get the setup 
ready for production:


http://mjanja.co.ke/2013/03/troubleshooting-glusterfs-performance-issues/

In a nutshell, I was trying to figure out why my shiny new servers 
weren't giving me the speed I assumed they would out of the box 
(something I'm sure every new user goes through!).  It was quite 
enlightening to run through the various components looking for 
bottlenecks.  In the end I found that NFS was many times faster than 
FUSE... which seems to contradict what most people see.  I haven't yet 
deployed my Gluster, so there's still time for more architectural 
changes if new information comes in!


Hope it helps someone!  Comments and questions appreciated.  Thanks,

--
Alan Orth
alan.o...@gmail.com
http://alaninkenya.org
http://mjanja.co.ke
I have always wished for my computer to be as easy to use as my telephone; my wish 
has come true because I can no longer figure out how to use my telephone. -Bjarne 
Stroustrup, inventor of C++

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow read performance

2013-03-26 Thread Anand Avati
Sorry for the late reply. The call profiles look OK on the server side. I
suspect it is still something to do with the client or network. Have you
mounted the FUSE client with any special options? like --direct-io-mode?
That can have a significant impact on read performance as read-ahead in the
page-cache (which is way more efficient than gluster's read-ahead
translator due to lack of context switch to serve the future page) is
effectively turned off.

I'm not sure if any of your networking (tcp/ip) configuration is either
good or bad.

Avati

On Mon, Mar 11, 2013 at 9:02 AM, Thomas Wakefield tw...@cola.iges.orgwrote:

 Is there a way to make a ramdisk support extended attributes?

 These are my current sysctl settings (and I have tried many different
 options):
 net.ipv4.ip_forward = 0
 net.ipv4.conf.default.rp_filter = 1
 net.ipv4.conf.default.accept_source_route = 0
 kernel.sysrq = 0
 kernel.core_uses_pid = 1
 net.ipv4.tcp_syncookies = 1
 kernel.msgmnb = 65536
 kernel.msgmax = 65536
 kernel.shmmax = 68719476736
 kernel.shmall = 4294967296
 kernel.panic = 5
 net.core.rmem_max = 67108864
 net.core.wmem_max = 67108864
 net.ipv4.tcp_rmem = 4096 87380 67108864
 net.ipv4.tcp_wmem = 4096 65536 67108864
 net.core.netdev_max_backlog = 25
 net.ipv4.tcp_congestion_control = htcp
 net.ipv4.tcp_mtu_probing = 1


 Here is the output from a dd write and dd read.

 [root@cpu_crew1 ~]# dd if=/dev/zero
 of=/shared/working/benchmark/test.cpucrew1 bs=512k count=1 ; dd
 if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=512k
 1+0 records in
 1+0 records out
 524288 bytes (5.2 GB) copied, 7.21958 seconds, 726 MB/s
 1+0 records in
 1+0 records out
 524288 bytes (5.2 GB) copied, 86.4165 seconds, 60.7 MB/s


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs-3.2.x after rebalance, read failed

2013-03-26 Thread lierihanmei
I test glusterfs 3.3.x,  after rebalance the file distribution likes 3.2.X.
The lucky is that  all files can be access.

lierihanmei lierihan...@163.com wrote:

HI all.
I have got a problem. after remove-bricks and rebalanced, when i read some 
file ,it gets io errors.


version:glusterfs 3.2.7
OS  :Centos6


steps:
1, gluster volume create str stripe 2 ip:/data1 ip:/data2 ip:/data3 ip:/data4 
ip:/data5 ip:/data6
2. gluster volume start str
3. mount -t glusterfs ip:/str /mnt/str
4. write some file into /mnt/str ( then from /mnt/str,  all files are OK, can 
be read and write)
5. gluster volume remove-brick str ip:/data3 ip:/data4(all files are OK too)
6, gluster rebalance str start
7, after rebalance complete(some files are OK, some can not access, get IO 
ERROR)
and i test glusterfs 3.2.5, 3.2.6, it get the same error.


after rebalance , the data are all in ip:/data5 and ip:/data6,   but in 
ip:/data1 and ip:/data2only link file like -T 1 root root 0 Mar 26 
14:07 /data/str1/file11  retains.


files which can not get all have link files.
the link file's attr:
[root@centos6-template dht]# getfattr -m . -e hex -d /data/str1/file11
getfattr: Removing leading '/' from absolute path names
# file: data/str1/file11
trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780
trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3100
trusted.str-stripe-0.stripe-count=0x3200
trusted.str-stripe-0.stripe-index=0x3000
trusted.str-stripe-0.stripe-size=0x31333130373200
the data file's attr:
[root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file11
getfattr: Removing leading '/' from absolute path names
# file: data/str6/file11
trusted.gfid=0xef8c64fa4516424a80bbcc65ad988780
trusted.str-stripe-2.stripe-count=0x3200
trusted.str-stripe-2.stripe-index=0x3100
trusted.str-stripe-2.stripe-size=0x31333130373200


files that can access:
[root@centos6-template dht]# getfattr -m . -e hex -d /data/str6/file6
getfattr: Removing leading '/' from absolute path names
# file: data/str6/file6
trusted.gfid=0xf1d6a5ec4f054926a65a1114ed3ed619
trusted.glusterfs.dht.linkto=0x7374722d7374726970652d3000
trusted.str-stripe-1.stripe-count=0x3200
trusted.str-stripe-1.stripe-index=0x3100
trusted.str-stripe-1.stripe-size=0x31333130373200
but from mountpoint , the attr is:
[root@centos6-template dht]# getfattr -m . -e hex -d file6
# file: file6
trusted.str-stripe-0.stripe-count=0x3200
trusted.str-stripe-0.stripe-index=0x3000
trusted.str-stripe-0.stripe-size=0x31333130373200


log of glusterfs:
nsport (str-dht2-client-2)
[2013-03-26 11:56:30.957186] T [rpc-clnt.c:1224:rpc_clnt_record] 
0-str-dht2-client-3: Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2013-03-26 11:56:30.957202] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] 
0-rpc-clnt: Request fraglen 344, payload: 216, rpc hdr: 128
[2013-03-26 11:56:30.957233] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: 
submitted request (XID: 0x11x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) 
to rpc-transport (str-dht2-client-3)
[2013-03-26 11:56:30.957479] T [rpc-clnt.c:638:rpc_clnt_reply_init] 
0-str-dht2-client-2: received rpc message (RPC XID: 0x11x Program: GlusterFS 
3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-2)
[2013-03-26 11:56:30.957536] T [rpc-clnt.c:638:rpc_clnt_reply_init] 
0-str-dht2-client-3: received rpc message (RPC XID: 0x11x Program: GlusterFS 
3.1, ProgVers: 310, Proc: 27) from rpc-transport (str-dht2-client-3)
[2013-03-26 11:56:30.957563] D [stripe.c:2673:stripe_open_lookup_cbk] 
0-str-dht2-stripe-1: /file11: stripe info need to be healed
[2013-03-26 11:56:30.957576] E [stripe.c:2691:stripe_open_lookup_cbk] 
0-str-dht2-stripe-1: stripe size not set
[2013-03-26 11:56:30.957591] D [dht-common.c:2527:dht_fd_cbk] 0-str-dht2-dht: 
subvolume str-dht2-stripe-1 returned -1 (Input/output error)
[2013-03-26 11:56:30.957618] W [quick-read.c:1640:qr_fstat_helper] 
0-str-dht2-quick-read: open failed on path (/file11) (Input/output error), 
unwinding fstat call
[2013-03-26 11:56:30.957643] W [fuse-bridge.c:516:fuse_attr_cbk] 
0-glusterfs-fuse: 7: FSTAT() /file11 = -1 (Input/output error)
[2013-03-26 11:56:30.957869] T [fuse-bridge.c:2086:fuse_flush] 
0-glusterfs-fuse: 8: FLUSH 0x7f1f9eb69024
[2013-03-26 11:56:30.957956] T [fuse-bridge.c:994:fuse_err_cbk] 
0-glusterfs-fuse: 8: FLUSH() ERR = 0
[2013-03-26 11:56:30.957998] T [fuse-bridge.c:2110:fuse_release] 
0-glusterfs-fuse: 9: RELEASE 0x7f1f9eb69024
[2013-03-26 11:56:33.809728] T [rpc-clnt.c:1224:rpc_clnt_record] 
0-str-dht2-client-0: Auth Info: pid: 9105, uid: 0, gid: 0, owner: 9105
[2013-03-26 11:56:33.809759] T [rpc-clnt.c:1125:rpc_clnt_record_build_header] 
0-rpc-clnt: Request fraglen 284, payload: 156, rpc hdr: 128
[2013-03-26 11:56:33.809794] T [rpc-clnt.c:1429:rpc_clnt_submit] 0-rpc-clnt: 
submitted request (XID: 0x12x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) 
to rpc-transport (str-dht2-client-0)
[2013-03-26 11:56:33.809824] T [rpc-clnt.c:1224:rpc_clnt_record] 
0-str-dht2-client-1: Auth