Re: [Gluster-devel] Problems with graph switch in disperse
On 02.01.2015 05:45, Raghavendra G wrote: > On Wed, Dec 31, 2014 at 11:25 PM, Xavier Hernandez wrote: > >> On 27.12.2014 13:43, l...@perabytes.com [1] wrote: >> >>> I tracked this problem, and found that the loc.parent and loc.pargfid are all null in the call sequences below: >>> >>> ec_manager_writev() -> ec_get_size_version() -> ec_lookup(). This can cause server_resolve() return an EINVAL. >>> >>> A replace-brick will cause all opened fd and inode table recreate, but ec_lookup() get the loc from fd->_ctx. >>> >>> So loc.parent and loc.pargfid are missing while fd changed. Other xlators always do a lookup from root >>> >>> directory, so never cause this problem. It seems that a recursive lookup from root directory may address this >>> >>> issue. >> >> EINVAL error is returned by protocol/server when it tries to resolve an inode based on a loc. If loc's 'name' field is not NULL nor empty, it tries to resolve the inode based on /. The problem here is that pargfid is 00...00. >> >> To solve this issue I've modified ec_loc_setup_parent() so that it clears loc's 'name' if parent inode cannot be determined. This forces protocol/server to resolve the inode based on , which is valid and can be resolved successfully. >> >> However this doesn't fully solve the bug. After solving this issue, I get an EIO error. Further investigations seems to indicate that this is caused by a locking problem caused by an incorrect management of ESTALE when the brick is replaced. > > ESTALE indicates either any of the following situations: > > 1. In the case of named-lookup (loc containing /), is not present. Which means parent is not present on the brick > 2. In the case of nameless lookup (loc containing only of the file), file/directory represented by gfid is not present on brick. > > Which among the above two scenarios is your case? In this particular case, the problem is with the second scenario, however there are other combinations that could lead to the first one. Basically the root cause is that after replacing a brick, the new brick is totally empty, so self heal needs to recover directory contents, but some running operations may try to use gfid already resolved that the new brick has never seen. In these cases the brick returns ESTALE, but ec incorrectly handled this as a fatal error while trying to acquire a lock, returning EIO for the full operation. I'll upload a patch to solve this problem. Xavi Links: -- [1] mailto:l...@perabytes.com [2] mailto:xhernan...@datalab.es ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problems with graph switch in disperse
On Wed, Dec 31, 2014 at 11:25 PM, Xavier Hernandez wrote: > On 27.12.2014 13:43, l...@perabytes.com wrote: > > > I tracked this problem, and found that the loc.parent and loc.pargfid are > all null in the call sequences below: > > ec_manager_writev() -> ec_get_size_version() -> ec_lookup(). This can > cause server_resolve() return an EINVAL. > > A replace-brick will cause all opened fd and inode table recreate, but > ec_lookup() get the loc from fd->_ctx. > > So loc.parent and loc.pargfid are missing while fd changed. Other xlators > always do a lookup from root > > directory, so never cause this problem. It seems that a recursive lookup > from root directory may address this > > issue. > > EINVAL error is returned by protocol/server when it tries to resolve an > inode based on a loc. If loc's 'name' field is not NULL nor empty, it tries > to resolve the inode based on /. The problem here is that > pargfid is 00...00. > > To solve this issue I've modified ec_loc_setup_parent() so that it clears > loc's 'name' if parent inode cannot be determined. This forces > protocol/server to resolve the inode based on , which is valid and can > be resolved successfully. > > However this doesn't fully solve the bug. After solving this issue, I get an > EIO error. Further investigations seems to indicate that this is caused by a > locking problem caused by an incorrect management of ESTALE when the brick is > replaced. > > ESTALE indicates either any of the following situations: 1. In the case of named-lookup (loc containing /), is not present. Which means parent is not present on the brick 2. In the case of nameless lookup (loc containing only of the file), file/directory represented by gfid is not present on brick. Which among the above two scenarios is your case? > > I'll upload a patch shortly to solve these issues. > > Xavi > > > > - 原邮件信息 - > *发件人:*Raghavendra Gowdappa > *发送时间:*14-12-24 21:48:56 > *收件人:*Xavier Hernandez > *抄送人:*Gluster Devel > *主题:*Re: [Gluster-devel] Problems with graph switch in disperse > > > Do you know the origins of EIO? fuse-bridge only fails a lookup fop with EIO > (when NULL gfid is received in a successful lookup reply). So, there might be > other xlator which is sending EIO. > > - Original Message - > > From: "Xavier Hernandez" > > To: "Gluster Devel" > > > Sent: Wednesday, December 24, 2014 6:25:17 PM > > Subject: [Gluster-devel] Problems with graph switch in disperse > > > > Hi, > > > > I'm experiencing a problem when gluster graph is changed as a result of > > a replace-brick operation (probably with any other operation that > > changes the graph) while the client is also doing other tasks, like > > writing a file. > > > > When operation starts, I see that the replaced brick is disconnected, > > but writes continue working normally with one brick less. > > > > At some point, another graph is created and comes online. Remaining > > bricks on the old graph are disconnected and the old graph is destroyed. > > I see how new write requests are sent to the new graph. > > > > This seems correct. However there's a point where I see this: > > > > [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume] > > 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472) > > [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec: > > WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1] > > frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1 > > {111:000:000} idx=0 > > [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record] > > 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner: > > d025e932897f > > [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush] > > 2-patchy-io-cache: locked inode(0x16d2810) > > [2014-12-24 11:29:58.541354] T > > [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request > > fraglen 152, payload: 84, rpc hdr: 68 > > [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush] > > 2-patchy-io-cache: unlocked inode(0x16d2810) > > [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush] > > 2-patchy-io-cache: locked inode(0x16d2810) > > [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush] > > 2-patchy-io-cache: unlocked inode(0x16d2810) > > [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit] > > 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3, > > ProgVers: 330, Proc: 29) to
Re: [Gluster-devel] Problems with graph switch in disperse
On 27.12.2014 13:43, l...@perabytes.com wrote: > I tracked this problem, and found that the loc.parent and loc.pargfid are all null in the call sequences below: > > ec_manager_writev() -> ec_get_size_version() -> ec_lookup(). This can cause server_resolve() return an EINVAL. > > A replace-brick will cause all opened fd and inode table recreate, but ec_lookup() get the loc from fd->_ctx. > > So loc.parent and loc.pargfid are missing while fd changed. Other xlators always do a lookup from root > > directory, so never cause this problem. It seems that a recursive lookup from root directory may address this > > issue. EINVAL error is returned by protocol/server when it tries to resolve an inode based on a loc. If loc's 'name' field is not NULL nor empty, it tries to resolve the inode based on /. The problem here is that pargfid is 00...00. To solve this issue I've modified ec_loc_setup_parent() so that it clears loc's 'name' if parent inode cannot be determined. This forces protocol/server to resolve the inode based on , which is valid and can be resolved successfully. However this doesn't fully solve the bug. After solving this issue, I get an EIO error. Further investigations seems to indicate that this is caused by a locking problem caused by an incorrect management of ESTALE when the brick is replaced. I'll upload a patch shortly to solve these issues. Xavi > - 原邮件信息 - > 发件人:Raghavendra Gowdappa > 发送时间:14-12-24 21:48:56 > 收件人:Xavier Hernandez > 抄送人:Gluster Devel > 主题:Re: [Gluster-devel] Problems with graph switch in disperse > > Do you know the origins of EIO? fuse-bridge only fails a lookup fop with EIO (when NULL gfid is received in a successful lookup reply). So, there might be other xlator which is sending EIO. > > - Original Message - > > From: "Xavier Hernandez" > > To: "Gluster Devel" > > Sent: Wednesday, December 24, 2014 6:25:17 PM > > Subject: [Gluster-devel] Problems with graph switch in disperse > > > > Hi, > > > > I'm experiencing a problem when gluster graph is changed as a result of > > a replace-brick operation (probably with any other operation that > > changes the graph) while the client is also doing other tasks, like > > writing a file. > > > > When operation starts, I see that the replaced brick is disconnected, > > but writes continue working normally with one brick less. > > > > At some point, another graph is created and comes online. Remaining > > bricks on the old graph are disconnected and the old graph is destroyed. > > I see how new write requests are sent to the new graph. > > > > This seems correct. However there's a point where I see this: > > > > [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume] > > 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472) > > [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec: > > WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1] > > frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1 > > {111:000:000} idx=0 > > [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record] > > 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner: > > d025e932897f > > [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush] > > 2-patchy-io-cache: locked inode(0x16d2810) > > [2014-12-24 11:29:58.541354] T > > [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request > > fraglen 152, payload: 84, rpc hdr: 68 > > [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush] > > 2-patchy-io-cache: unlocked inode(0x16d2810) > > [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush] > > 2-patchy-io-cache: locked inode(0x16d2810) > > [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush] > > 2-patchy-io-cache: unlocked inode(0x16d2810) > > [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit] > > 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3, > > ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0) > > [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk] > > 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error) > > > > It seems that fuse still has a write request pending for graph 0. It is > > resumed but it returns EIO without calling the xlator stack (operations > > seen between the two log messages are from other operations and they are > > sent to graph 2). I'm not sure why this happens and how I should aviod this. > > > > I tried the same scenario with re
Re: [Gluster-devel] Problems with graph switch in disperse
I tracked this problem, and found that the loc.parent and loc.pargfid are all null in the call sequences below: ec_manager_writev() -> ec_get_size_version() -> ec_lookup(). This can cause server_resolve() return an EINVAL. A replace-brick will cause all opened fd and inode table recreate, but ec_lookup() get the loc from fd->_ctx. So loc.parent and loc.pargfid are missing while fd changed. Other xlators always do a lookup from root directory, so never cause this problem. It seems that a recursive lookup from root directory may address this issue. - 原邮件信息 - 发件人:Raghavendra Gowdappa 发送时间:14-12-24 21:48:56 收件人:Xavier Hernandez 抄送人:Gluster Devel 主题:Re: [Gluster-devel] Problems with graph switch in disperse Do you know the origins of EIO? fuse-bridge only fails a lookup fop with EIO (when NULL gfid is received in a successful lookup reply). So, there might be other xlator which is sending EIO. - Original Message - > From: "Xavier Hernandez" > To: "Gluster Devel" > Sent: Wednesday, December 24, 2014 6:25:17 PM > Subject: [Gluster-devel] Problems with graph switch in disperse > > Hi, > > I'm experiencing a problem when gluster graph is changed as a result of > a replace-brick operation (probably with any other operation that > changes the graph) while the client is also doing other tasks, like > writing a file. > > When operation starts, I see that the replaced brick is disconnected, > but writes continue working normally with one brick less. > > At some point, another graph is created and comes online. Remaining > bricks on the old graph are disconnected and the old graph is destroyed. > I see how new write requests are sent to the new graph. > > This seems correct. However there's a point where I see this: > > [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume] > 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472) > [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec: > WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1] > frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1 > {111:000:000} idx=0 > [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record] > 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner: > d025e932897f > [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush] > 2-patchy-io-cache: locked inode(0x16d2810) > [2014-12-24 11:29:58.541354] T > [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request > fraglen 152, payload: 84, rpc hdr: 68 > [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush] > 2-patchy-io-cache: unlocked inode(0x16d2810) > [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush] > 2-patchy-io-cache: locked inode(0x16d2810) > [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush] > 2-patchy-io-cache: unlocked inode(0x16d2810) > [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit] > 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3, > ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0) > [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk] > 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error) > > It seems that fuse still has a write request pending for graph 0. It is > resumed but it returns EIO without calling the xlator stack (operations > seen between the two log messages are from other operations and they are > sent to graph 2). I'm not sure why this happens and how I should aviod this. > > I tried the same scenario with replicate and it seems to work, so there > must be something wrong in disperse, but I don't see where the problem > could be. > > Any ideas ? > > Thanks, > > Xavi > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problems with graph switch in disperse
Do you know the origins of EIO? fuse-bridge only fails a lookup fop with EIO (when NULL gfid is received in a successful lookup reply). So, there might be other xlator which is sending EIO. - Original Message - > From: "Xavier Hernandez" > To: "Gluster Devel" > Sent: Wednesday, December 24, 2014 6:25:17 PM > Subject: [Gluster-devel] Problems with graph switch in disperse > > Hi, > > I'm experiencing a problem when gluster graph is changed as a result of > a replace-brick operation (probably with any other operation that > changes the graph) while the client is also doing other tasks, like > writing a file. > > When operation starts, I see that the replaced brick is disconnected, > but writes continue working normally with one brick less. > > At some point, another graph is created and comes online. Remaining > bricks on the old graph are disconnected and the old graph is destroyed. > I see how new write requests are sent to the new graph. > > This seems correct. However there's a point where I see this: > > [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume] > 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472) > [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec: > WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1] > frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1 > {111:000:000} idx=0 > [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record] > 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner: > d025e932897f > [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush] > 2-patchy-io-cache: locked inode(0x16d2810) > [2014-12-24 11:29:58.541354] T > [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request > fraglen 152, payload: 84, rpc hdr: 68 > [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush] > 2-patchy-io-cache: unlocked inode(0x16d2810) > [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush] > 2-patchy-io-cache: locked inode(0x16d2810) > [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush] > 2-patchy-io-cache: unlocked inode(0x16d2810) > [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit] > 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3, > ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0) > [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk] > 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error) > > It seems that fuse still has a write request pending for graph 0. It is > resumed but it returns EIO without calling the xlator stack (operations > seen between the two log messages are from other operations and they are > sent to graph 2). I'm not sure why this happens and how I should aviod this. > > I tried the same scenario with replicate and it seems to work, so there > must be something wrong in disperse, but I don't see where the problem > could be. > > Any ideas ? > > Thanks, > > Xavi > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Problems with graph switch in disperse
Hi, I'm experiencing a problem when gluster graph is changed as a result of a replace-brick operation (probably with any other operation that changes the graph) while the client is also doing other tasks, like writing a file. When operation starts, I see that the replaced brick is disconnected, but writes continue working normally with one brick less. At some point, another graph is created and comes online. Remaining bricks on the old graph are disconnected and the old graph is destroyed. I see how new write requests are sent to the new graph. This seems correct. However there's a point where I see this: [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume] 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472) [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec: WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1] frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1 {111:000:000} idx=0 [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record] 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner: d025e932897f [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush] 2-patchy-io-cache: locked inode(0x16d2810) [2014-12-24 11:29:58.541354] T [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request fraglen 152, payload: 84, rpc hdr: 68 [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush] 2-patchy-io-cache: unlocked inode(0x16d2810) [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush] 2-patchy-io-cache: locked inode(0x16d2810) [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush] 2-patchy-io-cache: unlocked inode(0x16d2810) [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit] 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0) [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk] 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error) It seems that fuse still has a write request pending for graph 0. It is resumed but it returns EIO without calling the xlator stack (operations seen between the two log messages are from other operations and they are sent to graph 2). I'm not sure why this happens and how I should aviod this. I tried the same scenario with replicate and it seems to work, so there must be something wrong in disperse, but I don't see where the problem could be. Any ideas ? Thanks, Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel