Re: [Gluster-devel] Possible bug in the communications layer ?
I've uploaded a patch for this problem: http://review.gluster.org/14270 Any review will be very appreciated :) Thanks, Xavi On 09/05/16 12:35, Raghavendra Gowdappa wrote: - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Monday, May 9, 2016 3:07:16 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Raghavendra, I've finally found the bug. It was obvious but I didn't see it. Same here :). 1561 case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: 1562 default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, 1563 _rsp); 1564 1565 proghdr_buf = frag->fragcurrent; 1566 1567 __socket_proto_init_pending (priv, default_read_size); 1568 1569 frag->call_body.reply.accepted_success_state 1570 = SP_STATE_READING_PROC_HEADER; 1571 1572 /* fall through */ 1573 1574 case SP_STATE_READING_PROC_HEADER: 1575 __socket_proto_read (priv, ret); 1576 1577 gf_trace_add("xdrmem_create", default_read_size, (uintptr_t)proghdr_buf); 1578 /* there can be 'xdata' in read response, figure it out */ 1579 xdrmem_create (, proghdr_buf, default_read_size, 1580XDR_DECODE); 1581 1582 /* This will fail if there is xdata sent from server, if not, 1583well and good, we don't need to worry about */ 1584 xdr_gfs3_read_rsp (, _rsp); 1585 1586 free (read_rsp.xdata.xdata_val); 1587 1588 /* need to round off to proper roof (%4), as XDR packing pads 1589the end of opaque object with '0' */ 1590 size = roof (read_rsp.xdata.xdata_len, 4); 1591 1592 if (!size) { 1593 frag->call_body.reply.accepted_success_state 1594 = SP_STATE_READ_PROC_OPAQUE; 1595 goto read_proc_opaque; 1596 } 1597 1598 __socket_proto_init_pending (priv, size); 1599 1600 frag->call_body.reply.accepted_success_state 1601 = SP_STATE_READING_PROC_OPAQUE; The main problem here is that we are using two local variables (proghdr_buf and default_read_size) in two distinct states that might be called at different times. The particular case that is failing is the following: 1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared to read 116 bytes. default_read_size is set to 116 and proghdr_buf points to the buffer where data will be written. 2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is done. At this point the function returns and proghdr_buf and default_read_size are lost. 3. When more data is available, this function is called again and it starts executing at state SP_STATE_READING_PROC_HEADER. 4. The remaining 28 bytes are read. 5. When it checks the buffer and tries to decode it to see if there's xdata present, it uses the default values of proghdr_buf and default_read_size, that are 0. This causes the decode to leave read_rsp.xdata.xdata_len set to 0. 6. The program interprets that xdata_len being 0 means that there's no xdata, so it continues reading the remaining of the RPC packet into the payload buffer. If you want, I can send a patch for this. Yes. That would be helpful. The analysis is correct and moving initialization of prog_hdrbuf to line 1578 will fix the issue. If you are too busy, please let me know and I can patch it up too :). Thanks for debugging the issue :). regards, Raghavendra. Xavi On 05/05/16 10:21, Xavier Hernandez wrote: I've undone all changes and now I'm unable to reproduce the problem, so the modification I did is probably incorrect and not the root cause, as you described. I'll continue investigating... Xavi On 04/05/16 15:01, Xavier Hernandez wrote: On 04/05/16 14:47, Raghavendra Gowdappa wrote: - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Wednesday, May 4, 2016 5:37:56 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? I think I've found the problem. 1567 case SP_STATE_READING_PROC_HEADER: 1568 __socket_proto_read (priv, ret); 1569 1570 /* there can be 'xdata' in read response, figure it out */ 1571 xdrmem_create (, proghdr_buf, default_read_size, 1572
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Xavier Hernandez" <xhernan...@datalab.es> > To: "Raghavendra Gowdappa" <rgowd...@redhat.com> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Monday, May 9, 2016 3:07:16 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > Hi Raghavendra, > > I've finally found the bug. It was obvious but I didn't see it. Same here :). > > 1561 case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: > 1562 default_read_size = xdr_sizeof ((xdrproc_t) > xdr_gfs3_read_rsp, > 1563 _rsp); > 1564 > 1565 proghdr_buf = frag->fragcurrent; > 1566 > 1567 __socket_proto_init_pending (priv, > default_read_size); > 1568 > 1569 frag->call_body.reply.accepted_success_state > 1570 = SP_STATE_READING_PROC_HEADER; > 1571 > 1572 /* fall through */ > 1573 > 1574 case SP_STATE_READING_PROC_HEADER: > 1575 __socket_proto_read (priv, ret); > 1576 > 1577 gf_trace_add("xdrmem_create", default_read_size, > (uintptr_t)proghdr_buf); > 1578 /* there can be 'xdata' in read response, figure > it out */ > 1579 xdrmem_create (, proghdr_buf, default_read_size, > 1580XDR_DECODE); > 1581 > 1582 /* This will fail if there is xdata sent from > server, if not, > 1583well and good, we don't need to worry about */ > 1584 xdr_gfs3_read_rsp (, _rsp); > 1585 > 1586 free (read_rsp.xdata.xdata_val); > 1587 > 1588 /* need to round off to proper roof (%4), as XDR > packing pads > 1589the end of opaque object with '0' */ > 1590 size = roof (read_rsp.xdata.xdata_len, 4); > 1591 > 1592 if (!size) { > 1593 frag->call_body.reply.accepted_success_state > 1594 = SP_STATE_READ_PROC_OPAQUE; > 1595 goto read_proc_opaque; > 1596 } > 1597 > 1598 __socket_proto_init_pending (priv, size); > 1599 > 1600 frag->call_body.reply.accepted_success_state > 1601 = SP_STATE_READING_PROC_OPAQUE; > > The main problem here is that we are using two local variables > (proghdr_buf and default_read_size) in two distinct states that might be > called at different times. > > The particular case that is failing is the following: > > 1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared > to read 116 bytes. default_read_size is set to 116 and proghdr_buf > points to the buffer where data will be written. > > 2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is > done. At this point the function returns and proghdr_buf and > default_read_size are lost. > > 3. When more data is available, this function is called again and it > starts executing at state SP_STATE_READING_PROC_HEADER. > > 4. The remaining 28 bytes are read. > > 5. When it checks the buffer and tries to decode it to see if there's > xdata present, it uses the default values of proghdr_buf and > default_read_size, that are 0. This causes the decode to leave > read_rsp.xdata.xdata_len set to 0. > > 6. The program interprets that xdata_len being 0 means that there's no > xdata, so it continues reading the remaining of the RPC packet into the > payload buffer. > > If you want, I can send a patch for this. Yes. That would be helpful. The analysis is correct and moving initialization of prog_hdrbuf to line 1578 will fix the issue. If you are too busy, please let me know and I can patch it up too :). Thanks for debugging the issue :). regards, Raghavendra. > > Xavi > > On 05/05/16 10:21, Xavier Hernandez wrote: > > I've undone all changes and now I'm unable to reproduce the problem, so > > the modification I did is probably incorrect and not the root cause, as > > you described. > > > > I'll continue investigating... > > > > Xavi > > > > On 04/05/16 15:01, Xavier Hernandez wrote: > >> On 04/05/16 14:47, Raghavendra Gowdappa wrote: > >>> > >>> > >>> - Original Message - > >>>> From: "Xavier Hernandez" <xhernan...@datalab.es> > >>>> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> > >>>
Re: [Gluster-devel] Possible bug in the communications layer ?
Hi Raghavendra, I've finally found the bug. It was obvious but I didn't see it. 1561 case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: 1562 default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, 1563 _rsp); 1564 1565 proghdr_buf = frag->fragcurrent; 1566 1567 __socket_proto_init_pending (priv, default_read_size); 1568 1569 frag->call_body.reply.accepted_success_state 1570 = SP_STATE_READING_PROC_HEADER; 1571 1572 /* fall through */ 1573 1574 case SP_STATE_READING_PROC_HEADER: 1575 __socket_proto_read (priv, ret); 1576 1577 gf_trace_add("xdrmem_create", default_read_size, (uintptr_t)proghdr_buf); 1578 /* there can be 'xdata' in read response, figure it out */ 1579 xdrmem_create (, proghdr_buf, default_read_size, 1580XDR_DECODE); 1581 1582 /* This will fail if there is xdata sent from server, if not, 1583well and good, we don't need to worry about */ 1584 xdr_gfs3_read_rsp (, _rsp); 1585 1586 free (read_rsp.xdata.xdata_val); 1587 1588 /* need to round off to proper roof (%4), as XDR packing pads 1589the end of opaque object with '0' */ 1590 size = roof (read_rsp.xdata.xdata_len, 4); 1591 1592 if (!size) { 1593 frag->call_body.reply.accepted_success_state 1594 = SP_STATE_READ_PROC_OPAQUE; 1595 goto read_proc_opaque; 1596 } 1597 1598 __socket_proto_init_pending (priv, size); 1599 1600 frag->call_body.reply.accepted_success_state 1601 = SP_STATE_READING_PROC_OPAQUE; The main problem here is that we are using two local variables (proghdr_buf and default_read_size) in two distinct states that might be called at different times. The particular case that is failing is the following: 1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared to read 116 bytes. default_read_size is set to 116 and proghdr_buf points to the buffer where data will be written. 2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is done. At this point the function returns and proghdr_buf and default_read_size are lost. 3. When more data is available, this function is called again and it starts executing at state SP_STATE_READING_PROC_HEADER. 4. The remaining 28 bytes are read. 5. When it checks the buffer and tries to decode it to see if there's xdata present, it uses the default values of proghdr_buf and default_read_size, that are 0. This causes the decode to leave read_rsp.xdata.xdata_len set to 0. 6. The program interprets that xdata_len being 0 means that there's no xdata, so it continues reading the remaining of the RPC packet into the payload buffer. If you want, I can send a patch for this. Xavi On 05/05/16 10:21, Xavier Hernandez wrote: I've undone all changes and now I'm unable to reproduce the problem, so the modification I did is probably incorrect and not the root cause, as you described. I'll continue investigating... Xavi On 04/05/16 15:01, Xavier Hernandez wrote: On 04/05/16 14:47, Raghavendra Gowdappa wrote: - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Wednesday, May 4, 2016 5:37:56 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? I think I've found the problem. 1567 case SP_STATE_READING_PROC_HEADER: 1568 __socket_proto_read (priv, ret); 1569 1570 /* there can be 'xdata' in read response, figure it out */ 1571 xdrmem_create (, proghdr_buf, default_read_size, 1572XDR_DECODE); 1573 1574 /* This will fail if there is xdata sent from server, if not, 1575well and good, we don't need to worry about */ 1576 xdr_gfs3_read_rsp (, _rsp); 1577 1578 free (read_rsp.xdata.xdata_val); 1579 1580 /* need to round off to proper roof (%4), as XDR packing pads 1581the end of opaque object with '0' */ 1582 size = roof (read_rsp.xdata.xdata_len, 4); 1583 1584 if (!size) { 1585 frag->call_body.reply.accepted_success_state 1586 = SP_STATE_READ_PROC_OPAQUE; 1587 goto read_proc_opaque; 1588 } 1589 159
Re: [Gluster-devel] Possible bug in the communications layer ?
On 04/05/16 14:47, Raghavendra Gowdappa wrote: - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Wednesday, May 4, 2016 5:37:56 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? I think I've found the problem. 1567 case SP_STATE_READING_PROC_HEADER: 1568 __socket_proto_read (priv, ret); 1569 1570 /* there can be 'xdata' in read response, figure it out */ 1571 xdrmem_create (, proghdr_buf, default_read_size, 1572XDR_DECODE); 1573 1574 /* This will fail if there is xdata sent from server, if not, 1575well and good, we don't need to worry about */ 1576 xdr_gfs3_read_rsp (, _rsp); 1577 1578 free (read_rsp.xdata.xdata_val); 1579 1580 /* need to round off to proper roof (%4), as XDR packing pads 1581the end of opaque object with '0' */ 1582 size = roof (read_rsp.xdata.xdata_len, 4); 1583 1584 if (!size) { 1585 frag->call_body.reply.accepted_success_state 1586 = SP_STATE_READ_PROC_OPAQUE; 1587 goto read_proc_opaque; 1588 } 1589 1590 __socket_proto_init_pending (priv, size); 1591 1592 frag->call_body.reply.accepted_success_state 1593 = SP_STATE_READING_PROC_OPAQUE; 1594 1595 case SP_STATE_READING_PROC_OPAQUE: 1596 __socket_proto_read (priv, ret); 1597 1598 frag->call_body.reply.accepted_success_state 1599 = SP_STATE_READ_PROC_OPAQUE; On line 1568 we read, at most, 116 bytes because we calculate the size of a read response without xdata. Then we detect that we really need more data for xdata (BTW, read_rsp.xdata.xdata_val will be always allocated even if xdr_gfs3_read_rsp() fails ?) No. It need not be. Its guaranteed that only on a successful completion it is allocated. However, _if_ decoding fails only because xdr stream doesn't include xdata bits, but xdata_len is zero (by initializing it to default_read_size), then xdr library would've filled read_rsp.xdata.xdata_len (read_rsp.xdata.xdata_val can still be NULL). The question is: is it guaranteed that after an unsuccessful completion xdata_val will be NULL (i.e. not touched by the function, even if xadata_len is != 0) ? otherwise the free() could corrupt memory. So we get into line 1596 with the pending info initialized to read the remaining data. This is the __socket_proto_read macro: 166 /* This will be used in a switch case and breaks from the switch case if all 167 * the pending data is not read. 168 */ 169 #define __socket_proto_read(priv, ret) \ 170 { \ 171 size_t bytes_read = 0; \ 172 struct gf_sock_incoming *in; \ 173 in = >incoming; \ 174 \ 175 __socket_proto_update_pending (priv); \ 176 \ 177 ret = __socket_readv (this, \ 178 in->pending_vector, 1, \ 179 >pending_vector, \ 180 >pending_count, \ 181 _read); \ 182 if (ret == -1) \ 183 break; \ 184 __socket_proto_update_priv_after_read (priv, ret, bytes_read); \ 185 } We read from the socket using __socket_readv(). It it fails, we quit. However if the socket doesn't have more data to read, this function does not return -1: 555 ret = __socket_cached_read (this, opvector, opcount); 556 557 if (ret == 0) { 558 gf_log(this->name,GF_LOG_DEBUG,"EOF on socket"); 559 errno = ENODATA; 560 ret = -1; 561 } 562 if (ret == -1 && errno == EAGAIN) { 563 /* done for now */ 564 break; 565 } 566 this->total_bytes_read += ret; If __socket_cached_read() fails with errno == EAGAIN, we break and return opcount, which is >= 0. Causing the process to continue instead of waiting for more data. No. If you observe, there is a call to another macro __socket_proto_update_priv_after_rea
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Xavier Hernandez" <xhernan...@datalab.es> > To: "Raghavendra Gowdappa" <rgowd...@redhat.com> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Wednesday, May 4, 2016 5:37:56 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > I think I've found the problem. > > 1567 case SP_STATE_READING_PROC_HEADER: > 1568 __socket_proto_read (priv, ret); > 1569 > 1570 /* there can be 'xdata' in read response, figure > it out */ > 1571 xdrmem_create (, proghdr_buf, default_read_size, > 1572XDR_DECODE); > 1573 > 1574 /* This will fail if there is xdata sent from > server, if not, > 1575well and good, we don't need to worry about */ > 1576 xdr_gfs3_read_rsp (, _rsp); > 1577 > 1578 free (read_rsp.xdata.xdata_val); > 1579 > 1580 /* need to round off to proper roof (%4), as XDR > packing pads > 1581the end of opaque object with '0' */ > 1582 size = roof (read_rsp.xdata.xdata_len, 4); > 1583 > 1584 if (!size) { > 1585 frag->call_body.reply.accepted_success_state > 1586 = SP_STATE_READ_PROC_OPAQUE; > 1587 goto read_proc_opaque; > 1588 } > 1589 > 1590 __socket_proto_init_pending (priv, size); > 1591 > 1592 frag->call_body.reply.accepted_success_state > 1593 = SP_STATE_READING_PROC_OPAQUE; > 1594 > 1595 case SP_STATE_READING_PROC_OPAQUE: > 1596 __socket_proto_read (priv, ret); > 1597 > 1598 frag->call_body.reply.accepted_success_state > 1599 = SP_STATE_READ_PROC_OPAQUE; > > On line 1568 we read, at most, 116 bytes because we calculate the size > of a read response without xdata. Then we detect that we really need > more data for xdata (BTW, read_rsp.xdata.xdata_val will be always > allocated even if xdr_gfs3_read_rsp() fails ?) No. It need not be. Its guaranteed that only on a successful completion it is allocated. However, _if_ decoding fails only because xdr stream doesn't include xdata bits, but xdata_len is zero (by initializing it to default_read_size), then xdr library would've filled read_rsp.xdata.xdata_len (read_rsp.xdata.xdata_val can still be NULL). > > So we get into line 1596 with the pending info initialized to read the > remaining data. This is the __socket_proto_read macro: > >166 /* This will be used in a switch case and breaks from the switch > case if all >167 * the pending data is not read. >168 */ >169 #define __socket_proto_read(priv, ret) >\ >170 { >\ >171 size_t bytes_read = 0; >\ >172 struct gf_sock_incoming *in; >\ >173 in = >incoming; >\ >174 >\ >175 __socket_proto_update_pending (priv); >\ >176 >\ >177 ret = __socket_readv (this, >\ >178 in->pending_vector, 1, >\ >179 >pending_vector, >\ >180 >pending_count, >\ >181 _read); >\ >182 if (ret == -1) >\ >183 break; >\ >184 __socket_proto_update_priv_after_read (priv, ret, > bytes_read); \ >185 } > > We read from the socket using __socket_readv(). It it fails, we quit. > However if the socket doesn't have more data to read, this function does > not return -1: > >555 ret = __socket_cached_read (this, > opvector, opcount); >556 >557 if (ret == 0) { >558 > gf_log(this->name,GF_LOG_DEBUG,"EOF on socket"); >559 errno = ENODATA; >560 ret = -1; >561 } >562 if (ret == -1 && errno == EAGAIN) { >563 /* done for now */ >564 break; >565 } >566 this->tota
Re: [Gluster-devel] Possible bug in the communications layer ?
> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 5:14:02 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 1:21:57 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Raghavendra, yes, the readv response contains xdata. The dict length is 38 (0x26) and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the approach used in socket.c to get the length of xdata is correct. However, I cannot find any other way of xdata going into payload vector other than xdata_len being zero. Just to be double sure, I've a patch containing debug message printing xdata_len when decoding fails in socket.c. Can you please apply the patch, run the tests and revert back with results? Xavi On 29/04/16 09:10, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 12:36:43 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 12:07:59 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Jeff Darcy" <jda...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Thursday, April 28, 2016 8:15:36 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Jeff, On 28.04.2016 15:20, Jeff Darcy wrote: This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. Nice detective work, Xavi. It would be *very* interesting to see what the value of the "count" parameter is (it's unfortunately optimized out). I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of some problems with how this iov is put together, a couple of years ago, and it looks like you might have tripped over one more. It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. Does your read response include xdata too? I think the code related to reading xdata in readv response is a bit murky. case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, _rsp); proghdr_buf = frag->fragcurrent; __socket_proto_init_pending (priv, default_read_size); frag->call_body.reply.accepted_success_state
Re: [Gluster-devel] Possible bug in the communications layer ?
With your patch applied, it seems that the bug is not hit. I guess it's a timing issue that the new logging hides. Maybe no more data available after reading the partial readv header ? (it will arrive later). I'll continue testing... Xavi On 29/04/16 13:48, Raghavendra Gowdappa wrote: Attaching the patch. - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 5:14:02 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Raghavendra Gowdappa" <rgowd...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 1:21:57 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Raghavendra, yes, the readv response contains xdata. The dict length is 38 (0x26) and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the approach used in socket.c to get the length of xdata is correct. However, I cannot find any other way of xdata going into payload vector other than xdata_len being zero. Just to be double sure, I've a patch containing debug message printing xdata_len when decoding fails in socket.c. Can you please apply the patch, run the tests and revert back with results? Xavi On 29/04/16 09:10, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 12:36:43 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 12:07:59 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message ----- From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Jeff Darcy" <jda...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Thursday, April 28, 2016 8:15:36 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Jeff, On 28.04.2016 15:20, Jeff Darcy wrote: This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. Nice detective work, Xavi. It would be *very* interesting to see what the value of the "count" parameter is (it's unfortunately optimized out). I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of some problems with how this iov is put together, a couple of years ago, and it looks like you might have tripped over one more. It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. Does your read response include xdata too? I think the code related to reading xdata in readv response
Re: [Gluster-devel] Possible bug in the communications layer ?
Attaching the patch. - Original Message - > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > To: "Xavier Hernandez" <xhernan...@datalab.es> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Friday, April 29, 2016 5:14:02 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > - Original Message - > > From: "Xavier Hernandez" <xhernan...@datalab.es> > > To: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > Cc: "Gluster Devel" <gluster-devel@gluster.org> > > Sent: Friday, April 29, 2016 1:21:57 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > Hi Raghavendra, > > > > yes, the readv response contains xdata. The dict length is 38 (0x26) > > and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. > > rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the > approach used in socket.c to get the length of xdata is correct. However, I > cannot find any other way of xdata going into payload vector other than > xdata_len being zero. Just to be double sure, I've a patch containing debug > message printing xdata_len when decoding fails in socket.c. Can you please > apply the patch, run the tests and revert back with results? > > > > > Xavi > > > > On 29/04/16 09:10, Raghavendra Gowdappa wrote: > > > > > > > > > ----- Original Message - > > >> From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > >> To: "Xavier Hernandez" <xhernan...@datalab.es> > > >> Cc: "Gluster Devel" <gluster-devel@gluster.org> > > >> Sent: Friday, April 29, 2016 12:36:43 PM > > >> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > >> > > >> > > >> > > >> - Original Message - > > >>> From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > >>> To: "Xavier Hernandez" <xhernan...@datalab.es> > > >>> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" > > >>> <gluster-devel@gluster.org> > > >>> Sent: Friday, April 29, 2016 12:07:59 PM > > >>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > >>> > > >>> > > >>> > > >>> - Original Message - > > >>>> From: "Xavier Hernandez" <xhernan...@datalab.es> > > >>>> To: "Jeff Darcy" <jda...@redhat.com> > > >>>> Cc: "Gluster Devel" <gluster-devel@gluster.org> > > >>>> Sent: Thursday, April 28, 2016 8:15:36 PM > > >>>> Subject: Re: [Gluster-devel] Possible bug in the communications layer > > >>>> ? > > >>>> > > >>>> > > >>>> > > >>>> Hi Jeff, > > >>>> > > >>>> On 28.04.2016 15:20, Jeff Darcy wrote: > > >>>> > > >>>> > > >>>> > > >>>> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. > > >>>> The > > >>>> volume is a distributed-disperse 4*(4+2). I'm able to reproduce the > > >>>> problem > > >>>> easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > > >>>> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > > >>>> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after > > >>>> starting > > >>>> the > > >>>> read test. As can be seen in the data below, client3_3_readv_cbk() is > > >>>> processing an iovec of 116 bytes, however it should be of 154 bytes > > >>>> (the > > >>>> buffer in memory really seems to contain 154 bytes). The data on the > > >>>> network > > >>>> seems ok (at least I haven't been able to identify any problem), so > > >>>> this > > >>>> must be a processing error on the client side. The last field in cut > > >>>> buffer > > >>>> of the sequentialized data corresponds to the length of the xdata > > >>>> field: > > >>>> 0x26. So at least 38 more byte should be present. > > >>>> Nice detective work
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Xavier Hernandez" <xhernan...@datalab.es> > To: "Raghavendra Gowdappa" <rgowd...@redhat.com> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Friday, April 29, 2016 1:21:57 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > Hi Raghavendra, > > yes, the readv response contains xdata. The dict length is 38 (0x26) > and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the approach used in socket.c to get the length of xdata is correct. However, I cannot find any other way of xdata going into payload vector other than xdata_len being zero. Just to be double sure, I've a patch containing debug message printing xdata_len when decoding fails in socket.c. Can you please apply the patch, run the tests and revert back with results? > > Xavi > > On 29/04/16 09:10, Raghavendra Gowdappa wrote: > > > > > > - Original Message - > >> From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > >> To: "Xavier Hernandez" <xhernan...@datalab.es> > >> Cc: "Gluster Devel" <gluster-devel@gluster.org> > >> Sent: Friday, April 29, 2016 12:36:43 PM > >> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > >> > >> > >> > >> - Original Message - > >>> From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > >>> To: "Xavier Hernandez" <xhernan...@datalab.es> > >>> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" > >>> <gluster-devel@gluster.org> > >>> Sent: Friday, April 29, 2016 12:07:59 PM > >>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > >>> > >>> > >>> > >>> - Original Message - > >>>> From: "Xavier Hernandez" <xhernan...@datalab.es> > >>>> To: "Jeff Darcy" <jda...@redhat.com> > >>>> Cc: "Gluster Devel" <gluster-devel@gluster.org> > >>>> Sent: Thursday, April 28, 2016 8:15:36 PM > >>>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > >>>> > >>>> > >>>> > >>>> Hi Jeff, > >>>> > >>>> On 28.04.2016 15:20, Jeff Darcy wrote: > >>>> > >>>> > >>>> > >>>> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > >>>> volume is a distributed-disperse 4*(4+2). I'm able to reproduce the > >>>> problem > >>>> easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > >>>> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > >>>> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting > >>>> the > >>>> read test. As can be seen in the data below, client3_3_readv_cbk() is > >>>> processing an iovec of 116 bytes, however it should be of 154 bytes (the > >>>> buffer in memory really seems to contain 154 bytes). The data on the > >>>> network > >>>> seems ok (at least I haven't been able to identify any problem), so this > >>>> must be a processing error on the client side. The last field in cut > >>>> buffer > >>>> of the sequentialized data corresponds to the length of the xdata field: > >>>> 0x26. So at least 38 more byte should be present. > >>>> Nice detective work, Xavi. It would be *very* interesting to see what > >>>> the value of the "count" parameter is (it's unfortunately optimized > >>>> out). > >>>> I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > >>>> some problems with how this iov is put together, a couple of years ago, > >>>> and it looks like you might have tripped over one more. > >>>> It seems you are right. The count is 2 and the first 38 bytes of the > >>>> second > >>>> vector contains the remaining data of xdata field. > >>> > >>> This is the bug. client3_3_readv_cbk (and for that matter all the > >>> actors/cbks) expects response in utmost two vectors: > >>> 1. Program header containing request or response. This is subjected to > >>> decoding/encoding. This
Re: [Gluster-devel] Possible bug in the communications layer ?
Hi Raghavendra, yes, the readv response contains xdata. The dict length is 38 (0x26) and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. Xavi On 29/04/16 09:10, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 12:36:43 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 29, 2016 12:07:59 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" <xhernan...@datalab.es> To: "Jeff Darcy" <jda...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Thursday, April 28, 2016 8:15:36 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Jeff, On 28.04.2016 15:20, Jeff Darcy wrote: This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. Nice detective work, Xavi. It would be *very* interesting to see what the value of the "count" parameter is (it's unfortunately optimized out). I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of some problems with how this iov is put together, a couple of years ago, and it looks like you might have tripped over one more. It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. Does your read response include xdata too? I think the code related to reading xdata in readv response is a bit murky. case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, _rsp); proghdr_buf = frag->fragcurrent; __socket_proto_init_pending (priv, default_read_size); frag->call_body.reply.accepted_success_state = SP_STATE_READING_PROC_HEADER; /* fall through */ case SP_STATE_READING_PROC_HEADER: __socket_proto_read (priv, ret); By this time we've read read response _minus_ the xdata I meant we have read "readv response header" /* there can be 'xdata' in read response, figure it out */ xdrmem_create (, proghdr_buf, default_read_size, XDR_DECODE); We created xdr stream above with "default_read_size" (this doesn't include xdata) /* This will fail if there is xdata sent from server, if not, well and good, we don't need to worry about */ what if xdata is present and decoding failed (as length of xdr stream above - default_read_size - doesn't include xdata)? would we have a valid value in read_rsp.xdata.xdata_len? This is the part I am confused about. If read_rsp.xdata.xdata_len is not correct then there is a possibility that
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > To: "Xavier Hernandez" <xhernan...@datalab.es> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Friday, April 29, 2016 12:36:43 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > - Original Message - > > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > To: "Xavier Hernandez" <xhernan...@datalab.es> > > Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" > > <gluster-devel@gluster.org> > > Sent: Friday, April 29, 2016 12:07:59 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > > > > > - Original Message - > > > From: "Xavier Hernandez" <xhernan...@datalab.es> > > > To: "Jeff Darcy" <jda...@redhat.com> > > > Cc: "Gluster Devel" <gluster-devel@gluster.org> > > > Sent: Thursday, April 28, 2016 8:15:36 PM > > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > > > > > > > > > Hi Jeff, > > > > > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > > > > > > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > > > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the > > > problem > > > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > > > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > > > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting > > > the > > > read test. As can be seen in the data below, client3_3_readv_cbk() is > > > processing an iovec of 116 bytes, however it should be of 154 bytes (the > > > buffer in memory really seems to contain 154 bytes). The data on the > > > network > > > seems ok (at least I haven't been able to identify any problem), so this > > > must be a processing error on the client side. The last field in cut > > > buffer > > > of the sequentialized data corresponds to the length of the xdata field: > > > 0x26. So at least 38 more byte should be present. > > > Nice detective work, Xavi. It would be *very* interesting to see what > > > the value of the "count" parameter is (it's unfortunately optimized out). > > > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > > > some problems with how this iov is put together, a couple of years ago, > > > and it looks like you might have tripped over one more. > > > It seems you are right. The count is 2 and the first 38 bytes of the > > > second > > > vector contains the remaining data of xdata field. > > > > This is the bug. client3_3_readv_cbk (and for that matter all the > > actors/cbks) expects response in utmost two vectors: > > 1. Program header containing request or response. This is subjected to > > decoding/encoding. This vector should point to a buffer that contains the > > entire program header/response contiguously. > > 2. If the procedure returns payload (like readv response or a write > > request), > > second vector contains the buffer pointing to the entire (contiguous) > > payload. Note that this payload is raw and is not subjected to > > encoding/decoding. > > > > In your case, this _clean_ separation is broken with part of program header > > slipping into 2nd vector supposed to contain read data (may be because of > > rpc fragmentation). I think this is a bug in socket layer. I'll update more > > on this. > > Does your read response include xdata too? I think the code related to > reading xdata in readv response is a bit murky. > > > > case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: > default_read_size = xdr_sizeof ((xdrproc_t) > xdr_gfs3_read_rsp, > _rsp); > > proghdr_buf = frag->fragcurrent; > > __socket_proto_init_pending (priv, default_read_size); > > frag->call_body.reply.accepted_success_state > = SP_STATE_READING_PROC_HEADER; > > /* fall through */ > > case SP_STATE_READING_PROC_HEADER: > __socket_proto_read (priv, ret); > > >>>>> By this time we've read read response _minus_ the xdata I
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > To: "Xavier Hernandez" <xhernan...@datalab.es> > Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" > <gluster-devel@gluster.org> > Sent: Friday, April 29, 2016 12:07:59 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > - Original Message - > > From: "Xavier Hernandez" <xhernan...@datalab.es> > > To: "Jeff Darcy" <jda...@redhat.com> > > Cc: "Gluster Devel" <gluster-devel@gluster.org> > > Sent: Thursday, April 28, 2016 8:15:36 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > > > > > Hi Jeff, > > > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem > > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting > > the > > read test. As can be seen in the data below, client3_3_readv_cbk() is > > processing an iovec of 116 bytes, however it should be of 154 bytes (the > > buffer in memory really seems to contain 154 bytes). The data on the > > network > > seems ok (at least I haven't been able to identify any problem), so this > > must be a processing error on the client side. The last field in cut buffer > > of the sequentialized data corresponds to the length of the xdata field: > > 0x26. So at least 38 more byte should be present. > > Nice detective work, Xavi. It would be *very* interesting to see what > > the value of the "count" parameter is (it's unfortunately optimized out). > > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > > some problems with how this iov is put together, a couple of years ago, > > and it looks like you might have tripped over one more. > > It seems you are right. The count is 2 and the first 38 bytes of the second > > vector contains the remaining data of xdata field. > > This is the bug. client3_3_readv_cbk (and for that matter all the > actors/cbks) expects response in utmost two vectors: > 1. Program header containing request or response. This is subjected to > decoding/encoding. This vector should point to a buffer that contains the > entire program header/response contiguously. > 2. If the procedure returns payload (like readv response or a write request), > second vector contains the buffer pointing to the entire (contiguous) > payload. Note that this payload is raw and is not subjected to > encoding/decoding. > > In your case, this _clean_ separation is broken with part of program header > slipping into 2nd vector supposed to contain read data (may be because of > rpc fragmentation). I think this is a bug in socket layer. I'll update more > on this. Does your read response include xdata too? I think the code related to reading xdata in readv response is a bit murky. case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, _rsp); proghdr_buf = frag->fragcurrent; __socket_proto_init_pending (priv, default_read_size); frag->call_body.reply.accepted_success_state = SP_STATE_READING_PROC_HEADER; /* fall through */ case SP_STATE_READING_PROC_HEADER: __socket_proto_read (priv, ret); >>>>> By this time we've read read response _minus_ the xdata /* there can be 'xdata' in read response, figure it out */ xdrmem_create (, proghdr_buf, default_read_size, XDR_DECODE); >>>>>> We created xdr stream above with "default_read_size" (this doesn't >>>>>> include xdata) /* This will fail if there is xdata sent from server, if not, well and good, we don't need to worry about */ >>>>>> what if xdata is present and decoding failed (as length of xdr stream >>>>>> above - default_read_size - doesn't include xdata)? would we have a >>&g
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Xavier Hernandez" <xhernan...@datalab.es> > To: "Jeff Darcy" <jda...@redhat.com> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Thursday, April 28, 2016 8:15:36 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > Hi Jeff, > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the > read test. As can be seen in the data below, client3_3_readv_cbk() is > processing an iovec of 116 bytes, however it should be of 154 bytes (the > buffer in memory really seems to contain 154 bytes). The data on the network > seems ok (at least I haven't been able to identify any problem), so this > must be a processing error on the client side. The last field in cut buffer > of the sequentialized data corresponds to the length of the xdata field: > 0x26. So at least 38 more byte should be present. > Nice detective work, Xavi. It would be *very* interesting to see what > the value of the "count" parameter is (it's unfortunately optimized out). > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > some problems with how this iov is put together, a couple of years ago, > and it looks like you might have tripped over one more. > It seems you are right. The count is 2 and the first 38 bytes of the second > vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. > The rest of the data in > the second vector seems the payload of the readv fop, plus a 2 bytes > padding: > (gdb) f 0 > #0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, > count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021 > 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL, > (gdb) print *iov > $2 = {iov_base = 0x7fdc14b0d018, iov_len = 116} > (gdb) f 1 > #1 0x7fdc56dafab0 in rpc_clnt_handle_reply > (clnt=clnt@entry=0x7fdc3c1f4bb0, pollin=pollin@entry=0x7fdc34010f20) at > rpc-clnt.c:764 > 764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame); > (gdb) print *pollin > $3 = {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base = > 0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} times>}, count = 2, > vectored = 1 '\001', private = 0x7fdc340106c0, iobref = 0x7fdc34006660, > hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1 '\001'} > (gdb) f 0 > #0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, > count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021 > 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL, > (gdb) print iov[1] > $4 = {iov_base = 0x7fdc14a4d000, iov_len = 32808} > (gdb) print iov[2] > $5 = {iov_base = 0x2, iov_len = 140583741974112} > (gdb) x/128xb 0x7fdc14a4d000 > 0x7fdc14a4d000: 0x000x000x000x010x000x000x000x17 > 0x7fdc14a4d008: 0x000x000x000x020x670x6c0x750x73 > 0x7fdc14a4d010: 0x740x650x720x660x730x2e0x690x6e > 0x7fdc14a4d018: 0x6f0x640x650x6c0x6b0x2d0x630x6f > 0x7fdc14a4d020: 0x750x6e0x740x000x310x000x000x00 > 0x7fdc14a4d028: 0x5c0x5c0x5c0x5c0x5c0x5c0x5c0x5c > 0x7fdc14a4d030: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d038: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d040: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d048: 0x5c0x000x000x000x000x000x000x00 > 0x7fdc14a4d050: 0x000x000x000x000x000x000x000x00 > 0x7f
Re: [Gluster-devel] Possible bug in the communications layer ?
Thanks xavi. @Prasanna/Rafi/Rtalur, We had earlier seen "decoding failed" errors in quotad in another user setup. Possibly these are related. Can you please take a look? regards, Raghavendra - Original Message - > From: "Xavier Hernandez"> To: "Gluster Devel" > Sent: Thursday, April 28, 2016 6:13:01 PM > Subject: [Gluster-devel] Possible bug in the communications layer ? > > Hi, > > I've seen what seems a bug in the communications layer. The first sign > is an "XDR decoding failed" error in the logs. > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > volume is a distributed-disperse 4*(4+2). > > I'm able to reproduce the problem easily doing the following test: > > iozone -t2 -s10g -r1024k -i0 -w -F /iozone{1..2}.dat > echo 3 >/proc/sys/vm/drop_caches > iozone -t2 -s10g -r1024k -i1 -w -F /iozone{1..2}.dat > > The error happens soon after starting the read test. > > As can be seen in the data below, client3_3_readv_cbk() is processing an > iovec of 116 bytes, however it should be of 154 bytes (the buffer in > memory really seems to contain 154 bytes). The data on the network seems > ok (at least I haven't been able to identify any problem), so this must > be a processing error on the client side. > > The last field in cut buffer of the sequentialized data corresponds to > the length of the xdata field: 0x26. So at least 38 more byte should be > present. > > My guess is that some corner case is hit reading fragmented network > packets due to a high load. > > Debug information: > > Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c, > iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at > client-rpc-fops.c:3021 > 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL, > (gdb) print *iov > $1 = {iov_base = 0x7f53e994e018, iov_len = 116} > (gdb) x/116xb 0x7f53e994e018 > 0x7f53e994e018: 0x000x000x800x000x000x000x000x00 > 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e > 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e038: 0x000x000x000x000x000x000x080x00 > 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01 > 0x7f53e994e048: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e050: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00 > 0x7f53e994e060: 0x000x000x100x000x000x000x000x00 > 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f > 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3 > 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3 > 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00 > 0x7f53e994e088: 0x000x000x000x26 > (gdb) x/154xb 0x7f53e994e018 > 0x7f53e994e018: 0x000x000x800x000x000x000x000x00 > 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e > 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e038: 0x000x000x000x000x000x000x080x00 > 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01 > 0x7f53e994e048: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e050: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00 > 0x7f53e994e060: 0x000x000x100x000x000x000x000x00 > 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f > 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3 > 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3 > 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00 > 0x7f53e994e088: 0x000x000x000x260x000x000x000x01 > 0x7f53e994e090: 0x000x000x000x170x000x000x000x02 > 0x7f53e994e098: 0x670x6c0x750x730x740x650x720x66 > 0x7f53e994e0a0: 0x730x2e0x690x6e0x6f0x640x650x6c > 0x7f53e994e0a8: 0x6b0x2d0x630x6f0x750x6e0x740x00 > 0x7f53e994e0b0: 0x310x00 > (gdb) bt > #0 client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac, > count=, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021 > #1 0x7f542a677ab0 in rpc_clnt_handle_reply > (clnt=clnt@entry=0x7f54101cdef0, pollin=pollin@entry=0x7f5491f0) at > rpc-clnt.c:764 > #2 0x7f542a677d6f in rpc_clnt_notify (trans=, > mydata=0x7f54101cdf20, event=,
Re: [Gluster-devel] Possible bug in the communications layer ?
I've filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1331502 and added Raghavendra Gowdappa in the CC list (he appears as a maintainer of RPC). Xavi On 28.04.2016 18:42, Xavier Hernandez wrote: > Hi Niels, > > On 28.04.2016 15:44, Niels de Vos wrote: > >> On Thu, Apr 28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote: >> >>> Hi, I've seen what seems a bug in the communications layer. The first sign is an "XDR decoding failed" error in the logs. This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. >> >> Do you know if this only happens on disperse volumes, or also with >> others? > > I have only seen the problem with EC. Not sure if it's because I do most of my tests with EC or because other volumes do not manifest this problem. However I've only seen this when I started testing Ganesha. I have never seen the problem with FUSE. > > I think the reason is that FUSE is slower than Ganesha (little more than 120 MB/s vs 400 MB/s) and the combination of events needed to cause this problem would be much more unlikely to happen on FUSE. Since EC also talks to many bricks simultaneously (6 in this case), maybe this makes it more sensible to communications problems compared to a replicated volume. > >> If you have captured a network trace, could you provide it to >> me? You can use 'editcap -s0 ...' to copy only the relevant packets. >> But, I dont have an issue to download a few GB either if that is easier >> for you. >> >>> As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. My guess is that some corner case is hit reading fragmented network packets due to a high load. Debug information: Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL, (gdb) print *iov $1 = {iov_base = 0x7f53e994e018, iov_len = 116} (gdb) x/116xb 0x7f53e994e018 0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00 0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e >> >> Hmm, I'm not sure how this is layed out in memory. 0x80 would be one of >> first bytes in RPC payload, it signals 'last record' for the RPC >> procedure, and we only send one record anyway. The four bytes combined >> like (0x80 0x.. 0x.. 0x..) should be (0x80 | rpc-record-size). Reading >> this in Wireshark from a .pcap.gz is much easier :) > > The RPC header is already decoded here. The 116 bytes are only the content of the readv answer, as decoded by xdr_gfs3_read_rsp() function. > > This means that the first 4 bytes are the op_ret, next 4 are op_errno, followed by an encoded iatt (with gfid and ino number as the first fields). Then a size field and the length of a stream of bytes (corresponding to the encoded xdata). > > I've a network capture. I can upload it if you want, but I think this is what you are trying to see: > > (gdb) f 1 > #1 0x7fa88cb5bab0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fa870413e30, pollin=pollin@entry=0x7fa86400efe0) at rpc-clnt.c:764 > 764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame); > (gdb) print *pollin > $1 = {vector = {{iov_base = 0x7fa7ef58, iov_len = 140}, {iov_base = 0x7fa7ef46, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} }, count = 2, > vectored = 1 '01', private = 0x7fa86c00ec60, iobref = 0x7fa86400e560, hdr_iobuf = 0x7fa868022ee0, is_reply = 1 '01'} > (gdb) x/140xb 0x7fa7ef58 > 0x7fa7ef58: 0x00 0x00 0x89 0x68 0x00 0x00 0x00 0x01 > 0x7fa7ef580008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x7fa7ef580010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x7fa7ef580018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00 > 0x7fa7ef580020: 0xc3 0x8e 0x35 0xf0 0x31 0xa1 0x45 0x01 > 0x7fa7ef580028: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf > 0x7fa7ef580030: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf > 0x7fa7ef580038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00 > 0x7fa7ef580040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00 0x01 > 0x7fa7ef580048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x7fa7ef580050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x7fa7ef580058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00 > 0x7fa7ef580060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00 > 0x7fa7ef580068: 0x00 0x50 0x00 0x00 0x57 0x22 0x33 0x1f > 0x7fa7ef580070: 0x24 0x0c 0xc2
Re: [Gluster-devel] Possible bug in the communications layer ?
Hi Niels, On 28.04.2016 15:44, Niels de Vos wrote: > On Thu, Apr 28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote: > >> Hi, I've seen what seems a bug in the communications layer. The first sign is an "XDR decoding failed" error in the logs. This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. > > Do you know if this only happens on disperse volumes, or also with > others? I have only seen the problem with EC. Not sure if it's because I do most of my tests with EC or because other volumes do not manifest this problem. However I've only seen this when I started testing Ganesha. I have never seen the problem with FUSE. I think the reason is that FUSE is slower than Ganesha (little more than 120 MB/s vs 400 MB/s) and the combination of events needed to cause this problem would be much more unlikely to happen on FUSE. Since EC also talks to many bricks simultaneously (6 in this case), maybe this makes it more sensible to communications problems compared to a replicated volume. > If you have captured a network trace, could you provide it to > me? You can use 'editcap -s0 ...' to copy only the relevant packets. > But, I dont have an issue to download a few GB either if that is easier > for you. > >> As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. My guess is that some corner case is hit reading fragmented network packets due to a high load. Debug information: Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL, (gdb) print *iov $1 = {iov_base = 0x7f53e994e018, iov_len = 116} (gdb) x/116xb 0x7f53e994e018 0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00 0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e > > Hmm, I'm not sure how this is layed out in memory. 0x80 would be one of > first bytes in RPC payload, it signals 'last record' for the RPC > procedure, and we only send one record anyway. The four bytes combined > like (0x80 0x.. 0x.. 0x..) should be (0x80 | rpc-record-size). Reading > this in Wireshark from a .pcap.gz is much easier :) The RPC header is already decoded here. The 116 bytes are only the content of the readv answer, as decoded by xdr_gfs3_read_rsp() function. This means that the first 4 bytes are the op_ret, next 4 are op_errno, followed by an encoded iatt (with gfid and ino number as the first fields). Then a size field and the length of a stream of bytes (corresponding to the encoded xdata). I've a network capture. I can upload it if you want, but I think this is what you are trying to see: (gdb) f 1 #1 0x7fa88cb5bab0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fa870413e30, pollin=pollin@entry=0x7fa86400efe0) at rpc-clnt.c:764 764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame); (gdb) print *pollin $1 = {vector = {{iov_base = 0x7fa7ef58, iov_len = 140}, {iov_base = 0x7fa7ef46, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} }, count = 2, vectored = 1 '01', private = 0x7fa86c00ec60, iobref = 0x7fa86400e560, hdr_iobuf = 0x7fa868022ee0, is_reply = 1 '01'} (gdb) x/140xb 0x7fa7ef58 0x7fa7ef58: 0x00 0x00 0x89 0x68 0x00 0x00 0x00 0x01 0x7fa7ef580008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fa7ef580010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fa7ef580018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00 0x7fa7ef580020: 0xc3 0x8e 0x35 0xf0 0x31 0xa1 0x45 0x01 0x7fa7ef580028: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf 0x7fa7ef580030: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf 0x7fa7ef580038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00 0x7fa7ef580040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00 0x01 0x7fa7ef580048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fa7ef580050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fa7ef580058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00 0x7fa7ef580060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00 0x7fa7ef580068: 0x00 0x50 0x00 0x00 0x57 0x22 0x33 0x1f 0x7fa7ef580070: 0x24 0x0c 0xc2 0x58 0x57 0x22 0x33 0xa1 0x7fa7ef580078: 0x34 0x11 0x3b 0x54 0x57 0x22 0x33 0xa1 0x7fa7ef580080: 0x34 0x20 0x7d 0x7a 0x00 0x00 0x80 0x00 0x7fa7ef580088: 0x00 0x00 0x00 0x26 (gdb) f 3 #3 0x7fa88cb57853 in rpc_transport_notify (this=this@entry=0x7fa870423ab0,
Re: [Gluster-devel] Possible bug in the communications layer ?
Hi Jeff, On 28.04.2016 15:20, Jeff Darcy wrote: >> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. > > Nice detective work, Xavi. It would be *very* interesting to see what > the value of the "count" parameter is (it's unfortunately optimized out). > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > some problems with how this iov is put together, a couple of years ago, > and it looks like you might have tripped over one more. It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. The rest of the data in the second vector seems the payload of the readv fop, plus a 2 bytes padding: (gdb) f 0 #0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL, (gdb) print *iov $2 = {iov_base = 0x7fdc14b0d018, iov_len = 116} (gdb) f 1 #1 0x7fdc56dafab0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fdc3c1f4bb0, pollin=pollin@entry=0x7fdc34010f20) at rpc-clnt.c:764 764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame); (gdb) print *pollin $3 = {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base = 0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} }, count = 2, vectored = 1 '01', private = 0x7fdc340106c0, iobref = 0x7fdc34006660, hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1 '01'} (gdb) f 0 #0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL, (gdb) print iov[1] $4 = {iov_base = 0x7fdc14a4d000, iov_len = 32808} (gdb) print iov[2] $5 = {iov_base = 0x2, iov_len = 140583741974112} (gdb) x/128xb 0x7fdc14a4d000 0x7fdc14a4d000: 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17 0x7fdc14a4d008: 0x00 0x00 0x00 0x02 0x67 0x6c 0x75 0x73 0x7fdc14a4d010: 0x74 0x65 0x72 0x66 0x73 0x2e 0x69 0x6e 0x7fdc14a4d018: 0x6f 0x64 0x65 0x6c 0x6b 0x2d 0x63 0x6f 0x7fdc14a4d020: 0x75 0x6e 0x74 0x00 0x31 0x00 0x00 0x00 0x7fdc14a4d028: 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x7fdc14a4d030: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d040: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d048: 0x5c 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d058: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d060: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d068: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d070: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fdc14a4d078: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > Maybe it's related to all that epoll stuff. I'm currently using 4 epoll threads (this improves ec performance). I'll try to repeat the tests with a single epoll thread, but I'm not sure if this will be enough to get any conclusion if the problem doesn't manifest, since ec through fuse with 4 epoll threads doesn't seem to trigger the problem. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
On Thu, Apr 28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote: > Hi, > > I've seen what seems a bug in the communications layer. The first sign is an > "XDR decoding failed" error in the logs. > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > volume is a distributed-disperse 4*(4+2). > > I'm able to reproduce the problem easily doing the following test: > > iozone -t2 -s10g -r1024k -i0 -w -F /iozone{1..2}.dat > echo 3 >/proc/sys/vm/drop_caches > iozone -t2 -s10g -r1024k -i1 -w -F /iozone{1..2}.dat > > The error happens soon after starting the read test. Do you know if this only happens on disperse volumes, or also with others? If you have captured a network trace, could you provide it to me? You can use 'editcap -s0 ...' to copy only the relevant packets. But, I dont have an issue to download a few GB either if that is easier for you. > As can be seen in the data below, client3_3_readv_cbk() is processing an > iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory > really seems to contain 154 bytes). The data on the network seems ok (at > least I haven't been able to identify any problem), so this must be a > processing error on the client side. > > The last field in cut buffer of the sequentialized data corresponds to the > length of the xdata field: 0x26. So at least 38 more byte should be present. > > My guess is that some corner case is hit reading fragmented network packets > due to a high load. > > Debug information: > > Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac, > count=, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021 > 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL, > (gdb) print *iov > $1 = {iov_base = 0x7f53e994e018, iov_len = 116} > (gdb) x/116xb 0x7f53e994e018 > 0x7f53e994e018: 0x000x000x800x000x000x000x000x00 > 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e Hmm, I'm not sure how this is layed out in memory. 0x80 would be one of first bytes in RPC payload, it signals 'last record' for the RPC procedure, and we only send one record anyway. The four bytes combined like (0x80 0x.. 0x.. 0x..) should be (0x80 | rpc-record-size). Reading this in Wireshark from a .pcap.gz is much easier :) Don't hesitate to file a bug for this, and include the exact error messages you get. If possible with network capture, but if you can not share that publicly, please get it directly to me. Thanks, Niels > 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e038: 0x000x000x000x000x000x000x080x00 > 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01 > 0x7f53e994e048: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e050: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00 > 0x7f53e994e060: 0x000x000x100x000x000x000x000x00 > 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f > 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3 > 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3 > 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00 > 0x7f53e994e088: 0x000x000x000x26 > (gdb) x/154xb 0x7f53e994e018 > 0x7f53e994e018: 0x000x000x800x000x000x000x000x00 > 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e > 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75 > 0x7f53e994e038: 0x000x000x000x000x000x000x080x00 > 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01 > 0x7f53e994e048: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e050: 0x000x000x000x000x000x000x000x00 > 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00 > 0x7f53e994e060: 0x000x000x100x000x000x000x000x00 > 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f > 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3 > 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3 > 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00 > 0x7f53e994e088: 0x000x000x000x260x000x000x000x01 > 0x7f53e994e090: 0x000x000x000x170x000x000x000x02 > 0x7f53e994e098: 0x670x6c0x750x730x740x650x720x66 > 0x7f53e994e0a0: 0x730x2e0x690x6e0x6f0x640x650x6c > 0x7f53e994e0a8: 0x6b0x2d
Re: [Gluster-devel] Possible bug in the communications layer ?
> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > volume is a distributed-disperse 4*(4+2). > > I'm able to reproduce the problem easily doing the following test: > > iozone -t2 -s10g -r1024k -i0 -w -F /iozone{1..2}.dat > echo 3 >/proc/sys/vm/drop_caches > iozone -t2 -s10g -r1024k -i1 -w -F /iozone{1..2}.dat > > The error happens soon after starting the read test. > > As can be seen in the data below, client3_3_readv_cbk() is processing an > iovec of 116 bytes, however it should be of 154 bytes (the buffer in > memory really seems to contain 154 bytes). The data on the network seems > ok (at least I haven't been able to identify any problem), so this must > be a processing error on the client side. > > The last field in cut buffer of the sequentialized data corresponds to > the length of the xdata field: 0x26. So at least 38 more byte should be > present. Nice detective work, Xavi. It would be *very* interesting to see what the value of the "count" parameter is (it's unfortunately optimized out). I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of some problems with how this iov is put together, a couple of years ago, and it looks like you might have tripped over one more. Maybe it's related to all that epoll stuff. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel