Hi,

I'm experiencing a problem when gluster graph is changed as a result of a replace-brick operation (probably with any other operation that changes the graph) while the client is also doing other tasks, like writing a file.

When operation starts, I see that the replaced brick is disconnected, but writes continue working normally with one brick less.

At some point, another graph is created and comes online. Remaining bricks on the old graph are disconnected and the old graph is destroyed. I see how new write requests are sent to the new graph.

This seems correct. However there's a point where I see this:

[2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume] 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472) [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec: WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1] frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1 {111:000:000} idx=0 [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record] 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner: d025e932897f0000 [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush] 2-patchy-io-cache: locked inode(0x16d2810) [2014-12-24 11:29:58.541354] T [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request fraglen 152, payload: 84, rpc hdr: 68 [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush] 2-patchy-io-cache: unlocked inode(0x16d2810) [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush] 2-patchy-io-cache: locked inode(0x16d2810) [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush] 2-patchy-io-cache: unlocked inode(0x16d2810) [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit] 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0) [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk] 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error)

It seems that fuse still has a write request pending for graph 0. It is resumed but it returns EIO without calling the xlator stack (operations seen between the two log messages are from other operations and they are sent to graph 2). I'm not sure why this happens and how I should aviod this.

I tried the same scenario with replicate and it seems to work, so there must be something wrong in disperse, but I don't see where the problem could be.

Any ideas ?

Thanks,

Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to