Re: [Gluster-devel] Possible bug in the communications layer ?

2016-05-09 Thread Xavier Hernandez

I've uploaded a patch for this problem:

http://review.gluster.org/14270

Any review will be very appreciated :)

Thanks,

Xavi

On 09/05/16 12:35, Raghavendra Gowdappa wrote:



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Monday, May 9, 2016 3:07:16 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

Hi Raghavendra,

I've finally found the bug. It was obvious but I didn't see it.


Same here :).



  1561 case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
  1562 default_read_size = xdr_sizeof ((xdrproc_t)
xdr_gfs3_read_rsp,
  1563 _rsp);
  1564
  1565 proghdr_buf = frag->fragcurrent;
  1566
  1567 __socket_proto_init_pending (priv,
default_read_size);
  1568
  1569 frag->call_body.reply.accepted_success_state
  1570 = SP_STATE_READING_PROC_HEADER;
  1571
  1572 /* fall through */
  1573
  1574 case SP_STATE_READING_PROC_HEADER:
  1575 __socket_proto_read (priv, ret);
  1576
  1577 gf_trace_add("xdrmem_create", default_read_size,
(uintptr_t)proghdr_buf);
  1578 /* there can be 'xdata' in read response, figure
it out */
  1579 xdrmem_create (, proghdr_buf, default_read_size,
  1580XDR_DECODE);
  1581
  1582 /* This will fail if there is xdata sent from
server, if not,
  1583well and good, we don't need to worry about  */
  1584 xdr_gfs3_read_rsp (, _rsp);
  1585
  1586 free (read_rsp.xdata.xdata_val);
  1587
  1588 /* need to round off to proper roof (%4), as XDR
packing pads
  1589the end of opaque object with '0' */
  1590 size = roof (read_rsp.xdata.xdata_len, 4);
  1591
  1592 if (!size) {
  1593 frag->call_body.reply.accepted_success_state
  1594 = SP_STATE_READ_PROC_OPAQUE;
  1595 goto read_proc_opaque;
  1596 }
  1597
  1598 __socket_proto_init_pending (priv, size);
  1599
  1600 frag->call_body.reply.accepted_success_state
  1601 = SP_STATE_READING_PROC_OPAQUE;

The main problem here is that we are using two local variables
(proghdr_buf and default_read_size) in two distinct states that might be
called at different times.

The particular case that is failing is the following:

1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared
to read 116 bytes. default_read_size is set to 116 and proghdr_buf
points to the buffer where data will be written.

2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is
done. At this point the function returns and proghdr_buf and
default_read_size are lost.

3. When more data is available, this function is called again and it
starts executing at state SP_STATE_READING_PROC_HEADER.

4. The remaining 28 bytes are read.

5. When it checks the buffer and tries to decode it to see if there's
xdata present, it uses the default values of proghdr_buf and
default_read_size, that are 0. This causes the decode to leave
read_rsp.xdata.xdata_len set to 0.

6. The program interprets that xdata_len being 0 means that there's no
xdata, so it continues reading the remaining of the RPC packet into the
payload buffer.

If you want, I can send a patch for this.


Yes. That would be helpful. The analysis is correct and moving initialization 
of prog_hdrbuf to line 1578 will fix the issue. If you are too busy, please let 
me know and I can patch it up too :).

Thanks for debugging the issue :).

regards,
Raghavendra.



Xavi

On 05/05/16 10:21, Xavier Hernandez wrote:

I've undone all changes and now I'm unable to reproduce the problem, so
the modification I did is probably incorrect and not the root cause, as
you described.

I'll continue investigating...

Xavi

On 04/05/16 15:01, Xavier Hernandez wrote:

On 04/05/16 14:47, Raghavendra Gowdappa wrote:



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Wednesday, May 4, 2016 5:37:56 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

I think I've found the problem.

1567 case SP_STATE_READING_PROC_HEADER:
  1568 __socket_proto_read (priv, ret);
  1569
  1570 /* there can be 'xdata' in read response, figure
it out */
  1571 xdrmem_create (, proghdr_buf,
default_read_size,
  1572 

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-05-09 Thread Raghavendra Gowdappa


- Original Message -
> From: "Xavier Hernandez" <xhernan...@datalab.es>
> To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Monday, May 9, 2016 3:07:16 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> Hi Raghavendra,
> 
> I've finally found the bug. It was obvious but I didn't see it.

Same here :).

> 
>   1561 case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
>   1562 default_read_size = xdr_sizeof ((xdrproc_t)
> xdr_gfs3_read_rsp,
>   1563 _rsp);
>   1564
>   1565 proghdr_buf = frag->fragcurrent;
>   1566
>   1567 __socket_proto_init_pending (priv,
> default_read_size);
>   1568
>   1569 frag->call_body.reply.accepted_success_state
>   1570 = SP_STATE_READING_PROC_HEADER;
>   1571
>   1572 /* fall through */
>   1573
>   1574 case SP_STATE_READING_PROC_HEADER:
>   1575 __socket_proto_read (priv, ret);
>   1576
>   1577 gf_trace_add("xdrmem_create", default_read_size,
> (uintptr_t)proghdr_buf);
>   1578 /* there can be 'xdata' in read response, figure
> it out */
>   1579 xdrmem_create (, proghdr_buf, default_read_size,
>   1580XDR_DECODE);
>   1581
>   1582 /* This will fail if there is xdata sent from
> server, if not,
>   1583well and good, we don't need to worry about  */
>   1584 xdr_gfs3_read_rsp (, _rsp);
>   1585
>   1586 free (read_rsp.xdata.xdata_val);
>   1587
>   1588 /* need to round off to proper roof (%4), as XDR
> packing pads
>   1589the end of opaque object with '0' */
>   1590 size = roof (read_rsp.xdata.xdata_len, 4);
>   1591
>   1592 if (!size) {
>   1593 frag->call_body.reply.accepted_success_state
>   1594 = SP_STATE_READ_PROC_OPAQUE;
>   1595 goto read_proc_opaque;
>   1596 }
>   1597
>   1598 __socket_proto_init_pending (priv, size);
>   1599
>   1600 frag->call_body.reply.accepted_success_state
>   1601 = SP_STATE_READING_PROC_OPAQUE;
> 
> The main problem here is that we are using two local variables
> (proghdr_buf and default_read_size) in two distinct states that might be
> called at different times.
> 
> The particular case that is failing is the following:
> 
> 1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared
> to read 116 bytes. default_read_size is set to 116 and proghdr_buf
> points to the buffer where data will be written.
> 
> 2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is
> done. At this point the function returns and proghdr_buf and
> default_read_size are lost.
> 
> 3. When more data is available, this function is called again and it
> starts executing at state SP_STATE_READING_PROC_HEADER.
> 
> 4. The remaining 28 bytes are read.
> 
> 5. When it checks the buffer and tries to decode it to see if there's
> xdata present, it uses the default values of proghdr_buf and
> default_read_size, that are 0. This causes the decode to leave
> read_rsp.xdata.xdata_len set to 0.
> 
> 6. The program interprets that xdata_len being 0 means that there's no
> xdata, so it continues reading the remaining of the RPC packet into the
> payload buffer.
> 
> If you want, I can send a patch for this.

Yes. That would be helpful. The analysis is correct and moving initialization 
of prog_hdrbuf to line 1578 will fix the issue. If you are too busy, please let 
me know and I can patch it up too :).

Thanks for debugging the issue :).

regards,
Raghavendra.

> 
> Xavi
> 
> On 05/05/16 10:21, Xavier Hernandez wrote:
> > I've undone all changes and now I'm unable to reproduce the problem, so
> > the modification I did is probably incorrect and not the root cause, as
> > you described.
> >
> > I'll continue investigating...
> >
> > Xavi
> >
> > On 04/05/16 15:01, Xavier Hernandez wrote:
> >> On 04/05/16 14:47, Raghavendra Gowdappa wrote:
> >>>
> >>>
> >>> - Original Message -
> >>>> From: "Xavier Hernandez" <xhernan...@datalab.es>
> >>>> To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> >>>

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-05-09 Thread Xavier Hernandez

Hi Raghavendra,

I've finally found the bug. It was obvious but I didn't see it.

 1561 case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
 1562 default_read_size = xdr_sizeof ((xdrproc_t) 
xdr_gfs3_read_rsp,

 1563 _rsp);
 1564
 1565 proghdr_buf = frag->fragcurrent;
 1566
 1567 __socket_proto_init_pending (priv, 
default_read_size);

 1568
 1569 frag->call_body.reply.accepted_success_state
 1570 = SP_STATE_READING_PROC_HEADER;
 1571
 1572 /* fall through */
 1573
 1574 case SP_STATE_READING_PROC_HEADER:
 1575 __socket_proto_read (priv, ret);
 1576
 1577 gf_trace_add("xdrmem_create", default_read_size, 
(uintptr_t)proghdr_buf);
 1578 /* there can be 'xdata' in read response, figure 
it out */

 1579 xdrmem_create (, proghdr_buf, default_read_size,
 1580XDR_DECODE);
 1581
 1582 /* This will fail if there is xdata sent from 
server, if not,

 1583well and good, we don't need to worry about  */
 1584 xdr_gfs3_read_rsp (, _rsp);
 1585
 1586 free (read_rsp.xdata.xdata_val);
 1587
 1588 /* need to round off to proper roof (%4), as XDR 
packing pads

 1589the end of opaque object with '0' */
 1590 size = roof (read_rsp.xdata.xdata_len, 4);
 1591
 1592 if (!size) {
 1593 frag->call_body.reply.accepted_success_state
 1594 = SP_STATE_READ_PROC_OPAQUE;
 1595 goto read_proc_opaque;
 1596 }
 1597
 1598 __socket_proto_init_pending (priv, size);
 1599
 1600 frag->call_body.reply.accepted_success_state
 1601 = SP_STATE_READING_PROC_OPAQUE;

The main problem here is that we are using two local variables 
(proghdr_buf and default_read_size) in two distinct states that might be 
called at different times.


The particular case that is failing is the following:

1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared 
to read 116 bytes. default_read_size is set to 116 and proghdr_buf 
points to the buffer where data will be written.


2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is 
done. At this point the function returns and proghdr_buf and 
default_read_size are lost.


3. When more data is available, this function is called again and it 
starts executing at state SP_STATE_READING_PROC_HEADER.


4. The remaining 28 bytes are read.

5. When it checks the buffer and tries to decode it to see if there's 
xdata present, it uses the default values of proghdr_buf and 
default_read_size, that are 0. This causes the decode to leave 
read_rsp.xdata.xdata_len set to 0.


6. The program interprets that xdata_len being 0 means that there's no 
xdata, so it continues reading the remaining of the RPC packet into the 
payload buffer.


If you want, I can send a patch for this.

Xavi

On 05/05/16 10:21, Xavier Hernandez wrote:

I've undone all changes and now I'm unable to reproduce the problem, so
the modification I did is probably incorrect and not the root cause, as
you described.

I'll continue investigating...

Xavi

On 04/05/16 15:01, Xavier Hernandez wrote:

On 04/05/16 14:47, Raghavendra Gowdappa wrote:



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Wednesday, May 4, 2016 5:37:56 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

I think I've found the problem.

1567 case SP_STATE_READING_PROC_HEADER:
  1568 __socket_proto_read (priv, ret);
  1569
  1570 /* there can be 'xdata' in read response, figure
it out */
  1571 xdrmem_create (, proghdr_buf,
default_read_size,
  1572XDR_DECODE);
  1573
  1574 /* This will fail if there is xdata sent from
server, if not,
  1575well and good, we don't need to worry
about  */
  1576 xdr_gfs3_read_rsp (, _rsp);
  1577
  1578 free (read_rsp.xdata.xdata_val);
  1579
  1580 /* need to round off to proper roof (%4), as XDR
packing pads
  1581the end of opaque object with '0' */
  1582 size = roof (read_rsp.xdata.xdata_len, 4);
  1583
  1584 if (!size) {
  1585
frag->call_body.reply.accepted_success_state
  1586 = SP_STATE_READ_PROC_OPAQUE;
  1587 goto read_proc_opaque;
  1588 }
  1589
  159

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-05-04 Thread Xavier Hernandez

On 04/05/16 14:47, Raghavendra Gowdappa wrote:



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Wednesday, May 4, 2016 5:37:56 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

I think I've found the problem.

1567 case SP_STATE_READING_PROC_HEADER:
  1568 __socket_proto_read (priv, ret);
  1569
  1570 /* there can be 'xdata' in read response, figure
it out */
  1571 xdrmem_create (, proghdr_buf, default_read_size,
  1572XDR_DECODE);
  1573
  1574 /* This will fail if there is xdata sent from
server, if not,
  1575well and good, we don't need to worry about  */
  1576 xdr_gfs3_read_rsp (, _rsp);
  1577
  1578 free (read_rsp.xdata.xdata_val);
  1579
  1580 /* need to round off to proper roof (%4), as XDR
packing pads
  1581the end of opaque object with '0' */
  1582 size = roof (read_rsp.xdata.xdata_len, 4);
  1583
  1584 if (!size) {
  1585 frag->call_body.reply.accepted_success_state
  1586 = SP_STATE_READ_PROC_OPAQUE;
  1587 goto read_proc_opaque;
  1588 }
  1589
  1590 __socket_proto_init_pending (priv, size);
  1591
  1592 frag->call_body.reply.accepted_success_state
  1593 = SP_STATE_READING_PROC_OPAQUE;
  1594
  1595 case SP_STATE_READING_PROC_OPAQUE:
  1596 __socket_proto_read (priv, ret);
  1597
  1598 frag->call_body.reply.accepted_success_state
  1599 = SP_STATE_READ_PROC_OPAQUE;

On line 1568 we read, at most, 116 bytes because we calculate the size
of a read response without xdata. Then we detect that we really need
more data for xdata (BTW, read_rsp.xdata.xdata_val will be always
allocated even if xdr_gfs3_read_rsp() fails ?)


No. It need not be. Its guaranteed that only on a successful completion it is 
allocated. However, _if_ decoding fails only because xdr stream doesn't include 
xdata bits, but xdata_len is zero (by initializing it to default_read_size), 
then xdr library would've filled read_rsp.xdata.xdata_len 
(read_rsp.xdata.xdata_val can still be NULL).


The question is: is it guaranteed that after an unsuccessful completion 
xdata_val will be NULL (i.e. not touched by the function, even if 
xadata_len is != 0) ? otherwise the free() could corrupt memory.






So we get into line 1596 with the pending info initialized to read the
remaining data. This is the __socket_proto_read macro:

   166 /* This will be used in a switch case and breaks from the switch
case if all
   167  * the pending data is not read.
   168  */
   169 #define __socket_proto_read(priv, ret)
   \
   170 {
   \
   171 size_t bytes_read = 0;
   \
   172 struct gf_sock_incoming *in;
   \
   173 in = >incoming;
   \
   174
   \
   175 __socket_proto_update_pending (priv);
   \
   176
   \
   177 ret = __socket_readv (this,
   \
   178   in->pending_vector, 1,
   \
   179   >pending_vector,
   \
   180   >pending_count,
   \
   181   _read);
   \
   182 if (ret == -1)
   \
   183 break;
   \
   184 __socket_proto_update_priv_after_read (priv, ret,
bytes_read); \
   185 }

We read from the socket using __socket_readv(). It it fails, we quit.
However if the socket doesn't have more data to read, this function does
not return -1:

   555 ret = __socket_cached_read (this,
opvector, opcount);
   556
   557 if (ret == 0) {
   558
gf_log(this->name,GF_LOG_DEBUG,"EOF on socket");
   559 errno = ENODATA;
   560 ret = -1;
   561 }
   562 if (ret == -1 && errno == EAGAIN) {
   563 /* done for now */
   564 break;
   565 }
   566 this->total_bytes_read += ret;

If __socket_cached_read() fails with errno == EAGAIN, we break and
return opcount, which is >= 0. Causing the process to continue instead
of waiting for more data.


No. If you observe, there is a call to another macro 
__socket_proto_update_priv_after_rea

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-05-04 Thread Raghavendra Gowdappa


- Original Message -
> From: "Xavier Hernandez" <xhernan...@datalab.es>
> To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Wednesday, May 4, 2016 5:37:56 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> I think I've found the problem.
> 
> 1567 case SP_STATE_READING_PROC_HEADER:
>   1568 __socket_proto_read (priv, ret);
>   1569
>   1570 /* there can be 'xdata' in read response, figure
> it out */
>   1571 xdrmem_create (, proghdr_buf, default_read_size,
>   1572XDR_DECODE);
>   1573
>   1574 /* This will fail if there is xdata sent from
> server, if not,
>   1575well and good, we don't need to worry about  */
>   1576 xdr_gfs3_read_rsp (, _rsp);
>   1577
>   1578 free (read_rsp.xdata.xdata_val);
>   1579
>   1580 /* need to round off to proper roof (%4), as XDR
> packing pads
>   1581the end of opaque object with '0' */
>   1582 size = roof (read_rsp.xdata.xdata_len, 4);
>   1583
>   1584 if (!size) {
>   1585 frag->call_body.reply.accepted_success_state
>   1586 = SP_STATE_READ_PROC_OPAQUE;
>   1587 goto read_proc_opaque;
>   1588 }
>   1589
>   1590 __socket_proto_init_pending (priv, size);
>   1591
>   1592 frag->call_body.reply.accepted_success_state
>   1593 = SP_STATE_READING_PROC_OPAQUE;
>   1594
>   1595 case SP_STATE_READING_PROC_OPAQUE:
>   1596 __socket_proto_read (priv, ret);
>   1597
>   1598 frag->call_body.reply.accepted_success_state
>   1599 = SP_STATE_READ_PROC_OPAQUE;
> 
> On line 1568 we read, at most, 116 bytes because we calculate the size
> of a read response without xdata. Then we detect that we really need
> more data for xdata (BTW, read_rsp.xdata.xdata_val will be always
> allocated even if xdr_gfs3_read_rsp() fails ?)

No. It need not be. Its guaranteed that only on a successful completion it is 
allocated. However, _if_ decoding fails only because xdr stream doesn't include 
xdata bits, but xdata_len is zero (by initializing it to default_read_size), 
then xdr library would've filled read_rsp.xdata.xdata_len 
(read_rsp.xdata.xdata_val can still be NULL).

> 
> So we get into line 1596 with the pending info initialized to read the
> remaining data. This is the __socket_proto_read macro:
> 
>166 /* This will be used in a switch case and breaks from the switch
> case if all
>167  * the pending data is not read.
>168  */
>169 #define __socket_proto_read(priv, ret)
>\
>170 {
>\
>171 size_t bytes_read = 0;
>\
>172 struct gf_sock_incoming *in;
>\
>173 in = >incoming;
>\
>174
>\
>175 __socket_proto_update_pending (priv);
>\
>176
>\
>177 ret = __socket_readv (this,
>\
>178   in->pending_vector, 1,
>\
>179   >pending_vector,
>\
>180   >pending_count,
>\
>181   _read);
>\
>182 if (ret == -1)
>\
>183 break;
>\
>184 __socket_proto_update_priv_after_read (priv, ret,
> bytes_read); \
>185 }
> 
> We read from the socket using __socket_readv(). It it fails, we quit.
> However if the socket doesn't have more data to read, this function does
> not return -1:
> 
>555 ret = __socket_cached_read (this,
> opvector, opcount);
>556
>557 if (ret == 0) {
>558
> gf_log(this->name,GF_LOG_DEBUG,"EOF on socket");
>559 errno = ENODATA;
>560 ret = -1;
>561 }
>562 if (ret == -1 && errno == EAGAIN) {
>563 /* done for now */
>564 break;
>565 }
>566 this->tota

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-05-04 Thread Xavier Hernandez
>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 5:14:02 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 1:21:57 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

Hi Raghavendra,

yes, the readv response contains xdata. The dict length is 38 (0x26)
and, at the moment of failure, rsp.xdata.xdata_len already contains
0x26.


rsp.xdata.xdata_len having 0x26 even when decoding failed indicates
that the
approach used in socket.c to get the length of xdata is correct.
However, I
cannot find any other way of xdata going into payload vector other than
xdata_len being zero. Just to be double sure, I've a patch containing
debug
message printing xdata_len when decoding fails in socket.c. Can you
please
apply the patch, run the tests and revert back with results?



Xavi

On 29/04/16 09:10, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:36:43 PM
Subject: Re: [Gluster-devel] Possible bug in the communications
layer ?



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
<gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:07:59 PM
Subject: Re: [Gluster-devel] Possible bug in the communications
layer ?



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Jeff Darcy" <jda...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, April 28, 2016 8:15:36 PM
Subject: Re: [Gluster-devel] Possible bug in the communications
layer
?



Hi Jeff,

On 28.04.2016 15:20, Jeff Darcy wrote:



This happens with Gluster 3.7.11 accessed through Ganesha and
gfapi.
The
volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
problem
easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2
-s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after
starting
the
read test. As can be seen in the data below,
client3_3_readv_cbk() is
processing an iovec of 116 bytes, however it should be of 154 bytes
(the
buffer in memory really seems to contain 154 bytes). The data on
the
network
seems ok (at least I haven't been able to identify any problem), so
this
must be a processing error on the client side. The last field in
cut
buffer
of the sequentialized data corresponds to the length of the xdata
field:
0x26. So at least 38 more byte should be present.
Nice detective work, Xavi.  It would be *very* interesting to
see what
the value of the "count" parameter is (it's unfortunately optimized
out).
I'll bet it's two, and iov[1].iov_len is 38.  I have a weak
memory of
some problems with how this iov is put together, a couple of years
ago,
and it looks like you might have tripped over one more.
It seems you are right. The count is 2 and the first 38 bytes of
the
second
vector contains the remaining data of xdata field.


This is the bug. client3_3_readv_cbk (and for that matter all the
actors/cbks) expects response in utmost two vectors:
1. Program header containing request or response. This is
subjected to
decoding/encoding. This vector should point to a buffer that
contains
the
entire program header/response contiguously.
2. If the procedure returns payload (like readv response or a write
request),
second vector contains the buffer pointing to the entire
(contiguous)
payload. Note that this payload is raw and is not subjected to
encoding/decoding.

In your case, this _clean_ separation is broken with part of program
header
slipping into 2nd vector supposed to contain read data (may be
because
of
rpc fragmentation). I think this is a bug in socket layer. I'll
update
more
on this.


Does your read response include xdata too? I think the code
related to
reading xdata in readv response is a bit murky.



case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
default_read_size = xdr_sizeof ((xdrproc_t)
xdr_gfs3_read_rsp,
_rsp);

proghdr_buf = frag->fragcurrent;

__socket_proto_init_pending (priv,
default_read_size);

frag->call_body.reply.accepted_success_state

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Xavier Hernandez

With your patch applied, it seems that the bug is not hit.

I guess it's a timing issue that the new logging hides. Maybe no more 
data available after reading the partial readv header ? (it will arrive 
later).


I'll continue testing...

Xavi

On 29/04/16 13:48, Raghavendra Gowdappa wrote:

Attaching the patch.

- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 5:14:02 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 1:21:57 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

Hi Raghavendra,

yes, the readv response contains xdata. The dict length is 38 (0x26)
and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26.


rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the
approach used in socket.c to get the length of xdata is correct. However, I
cannot find any other way of xdata going into payload vector other than
xdata_len being zero. Just to be double sure, I've a patch containing debug
message printing xdata_len when decoding fails in socket.c. Can you please
apply the patch, run the tests and revert back with results?



Xavi

On 29/04/16 09:10, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:36:43 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
<gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:07:59 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



- Original Message -----

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Jeff Darcy" <jda...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, April 28, 2016 8:15:36 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer
?



Hi Jeff,

On 28.04.2016 15:20, Jeff Darcy wrote:



This happens with Gluster 3.7.11 accessed through Ganesha and gfapi.
The
volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
problem
easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after
starting
the
read test. As can be seen in the data below, client3_3_readv_cbk() is
processing an iovec of 116 bytes, however it should be of 154 bytes
(the
buffer in memory really seems to contain 154 bytes). The data on the
network
seems ok (at least I haven't been able to identify any problem), so
this
must be a processing error on the client side. The last field in cut
buffer
of the sequentialized data corresponds to the length of the xdata
field:
0x26. So at least 38 more byte should be present.
Nice detective work, Xavi.  It would be *very* interesting to see what
the value of the "count" parameter is (it's unfortunately optimized
out).
I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
some problems with how this iov is put together, a couple of years
ago,
and it looks like you might have tripped over one more.
It seems you are right. The count is 2 and the first 38 bytes of the
second
vector contains the remaining data of xdata field.


This is the bug. client3_3_readv_cbk (and for that matter all the
actors/cbks) expects response in utmost two vectors:
1. Program header containing request or response. This is subjected to
decoding/encoding. This vector should point to a buffer that contains
the
entire program header/response contiguously.
2. If the procedure returns payload (like readv response or a write
request),
second vector contains the buffer pointing to the entire (contiguous)
payload. Note that this payload is raw and is not subjected to
encoding/decoding.

In your case, this _clean_ separation is broken with part of program
header
slipping into 2nd vector supposed to contain read data (may be because
of
rpc fragmentation). I think this is a bug in socket layer. I'll update
more
on this.


Does your read response include xdata too? I think the code related to
reading xdata in readv response

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Raghavendra Gowdappa
Attaching the patch.

- Original Message -
> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> To: "Xavier Hernandez" <xhernan...@datalab.es>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Friday, April 29, 2016 5:14:02 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> 
> 
> - Original Message -
> > From: "Xavier Hernandez" <xhernan...@datalab.es>
> > To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > Sent: Friday, April 29, 2016 1:21:57 PM
> > Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> > 
> > Hi Raghavendra,
> > 
> > yes, the readv response contains xdata. The dict length is 38 (0x26)
> > and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26.
> 
> rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the
> approach used in socket.c to get the length of xdata is correct. However, I
> cannot find any other way of xdata going into payload vector other than
> xdata_len being zero. Just to be double sure, I've a patch containing debug
> message printing xdata_len when decoding fails in socket.c. Can you please
> apply the patch, run the tests and revert back with results?
> 
> > 
> > Xavi
> > 
> > On 29/04/16 09:10, Raghavendra Gowdappa wrote:
> > >
> > >
> > > ----- Original Message -
> > >> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> > >> To: "Xavier Hernandez" <xhernan...@datalab.es>
> > >> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > >> Sent: Friday, April 29, 2016 12:36:43 PM
> > >> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> > >>
> > >>
> > >>
> > >> - Original Message -
> > >>> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> > >>> To: "Xavier Hernandez" <xhernan...@datalab.es>
> > >>> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
> > >>> <gluster-devel@gluster.org>
> > >>> Sent: Friday, April 29, 2016 12:07:59 PM
> > >>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> > >>>
> > >>>
> > >>>
> > >>> - Original Message -
> > >>>> From: "Xavier Hernandez" <xhernan...@datalab.es>
> > >>>> To: "Jeff Darcy" <jda...@redhat.com>
> > >>>> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > >>>> Sent: Thursday, April 28, 2016 8:15:36 PM
> > >>>> Subject: Re: [Gluster-devel] Possible bug in the communications layer
> > >>>> ?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Hi Jeff,
> > >>>>
> > >>>> On 28.04.2016 15:20, Jeff Darcy wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi.
> > >>>> The
> > >>>> volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
> > >>>> problem
> > >>>> easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
> > >>>> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
> > >>>> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after
> > >>>> starting
> > >>>> the
> > >>>> read test. As can be seen in the data below, client3_3_readv_cbk() is
> > >>>> processing an iovec of 116 bytes, however it should be of 154 bytes
> > >>>> (the
> > >>>> buffer in memory really seems to contain 154 bytes). The data on the
> > >>>> network
> > >>>> seems ok (at least I haven't been able to identify any problem), so
> > >>>> this
> > >>>> must be a processing error on the client side. The last field in cut
> > >>>> buffer
> > >>>> of the sequentialized data corresponds to the length of the xdata
> > >>>> field:
> > >>>> 0x26. So at least 38 more byte should be present.
> > >>>> Nice detective work

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Raghavendra Gowdappa


- Original Message -
> From: "Xavier Hernandez" <xhernan...@datalab.es>
> To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Friday, April 29, 2016 1:21:57 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> Hi Raghavendra,
> 
> yes, the readv response contains xdata. The dict length is 38 (0x26)
> and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26.

rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the 
approach used in socket.c to get the length of xdata is correct. However, I 
cannot find any other way of xdata going into payload vector other than 
xdata_len being zero. Just to be double sure, I've a patch containing debug 
message printing xdata_len when decoding fails in socket.c. Can you please 
apply the patch, run the tests and revert back with results?

> 
> Xavi
> 
> On 29/04/16 09:10, Raghavendra Gowdappa wrote:
> >
> >
> > - Original Message -
> >> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> >> To: "Xavier Hernandez" <xhernan...@datalab.es>
> >> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> >> Sent: Friday, April 29, 2016 12:36:43 PM
> >> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> >>
> >>
> >>
> >> - Original Message -
> >>> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> >>> To: "Xavier Hernandez" <xhernan...@datalab.es>
> >>> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
> >>> <gluster-devel@gluster.org>
> >>> Sent: Friday, April 29, 2016 12:07:59 PM
> >>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> >>>
> >>>
> >>>
> >>> - Original Message -
> >>>> From: "Xavier Hernandez" <xhernan...@datalab.es>
> >>>> To: "Jeff Darcy" <jda...@redhat.com>
> >>>> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> >>>> Sent: Thursday, April 28, 2016 8:15:36 PM
> >>>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> >>>>
> >>>>
> >>>>
> >>>> Hi Jeff,
> >>>>
> >>>> On 28.04.2016 15:20, Jeff Darcy wrote:
> >>>>
> >>>>
> >>>>
> >>>> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> >>>> volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
> >>>> problem
> >>>> easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
> >>>> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
> >>>> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
> >>>> the
> >>>> read test. As can be seen in the data below, client3_3_readv_cbk() is
> >>>> processing an iovec of 116 bytes, however it should be of 154 bytes (the
> >>>> buffer in memory really seems to contain 154 bytes). The data on the
> >>>> network
> >>>> seems ok (at least I haven't been able to identify any problem), so this
> >>>> must be a processing error on the client side. The last field in cut
> >>>> buffer
> >>>> of the sequentialized data corresponds to the length of the xdata field:
> >>>> 0x26. So at least 38 more byte should be present.
> >>>> Nice detective work, Xavi.  It would be *very* interesting to see what
> >>>> the value of the "count" parameter is (it's unfortunately optimized
> >>>> out).
> >>>> I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
> >>>> some problems with how this iov is put together, a couple of years ago,
> >>>> and it looks like you might have tripped over one more.
> >>>> It seems you are right. The count is 2 and the first 38 bytes of the
> >>>> second
> >>>> vector contains the remaining data of xdata field.
> >>>
> >>> This is the bug. client3_3_readv_cbk (and for that matter all the
> >>> actors/cbks) expects response in utmost two vectors:
> >>> 1. Program header containing request or response. This is subjected to
> >>> decoding/encoding. This 

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Xavier Hernandez

Hi Raghavendra,

yes, the readv response contains xdata. The dict length is 38 (0x26) 
and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26.


Xavi

On 29/04/16 09:10, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:36:43 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
<gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:07:59 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



- Original Message -

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Jeff Darcy" <jda...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, April 28, 2016 8:15:36 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



Hi Jeff,

On 28.04.2016 15:20, Jeff Darcy wrote:



This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
problem
easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
the
read test. As can be seen in the data below, client3_3_readv_cbk() is
processing an iovec of 116 bytes, however it should be of 154 bytes (the
buffer in memory really seems to contain 154 bytes). The data on the
network
seems ok (at least I haven't been able to identify any problem), so this
must be a processing error on the client side. The last field in cut
buffer
of the sequentialized data corresponds to the length of the xdata field:
0x26. So at least 38 more byte should be present.
Nice detective work, Xavi.  It would be *very* interesting to see what
the value of the "count" parameter is (it's unfortunately optimized out).
I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
some problems with how this iov is put together, a couple of years ago,
and it looks like you might have tripped over one more.
It seems you are right. The count is 2 and the first 38 bytes of the
second
vector contains the remaining data of xdata field.


This is the bug. client3_3_readv_cbk (and for that matter all the
actors/cbks) expects response in utmost two vectors:
1. Program header containing request or response. This is subjected to
decoding/encoding. This vector should point to a buffer that contains the
entire program header/response contiguously.
2. If the procedure returns payload (like readv response or a write
request),
second vector contains the buffer pointing to the entire (contiguous)
payload. Note that this payload is raw and is not subjected to
encoding/decoding.

In your case, this _clean_ separation is broken with part of program header
slipping into 2nd vector supposed to contain read data (may be because of
rpc fragmentation). I think this is a bug in socket layer. I'll update more
on this.


Does your read response include xdata too? I think the code related to
reading xdata in readv response is a bit murky.



case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
default_read_size = xdr_sizeof ((xdrproc_t)
xdr_gfs3_read_rsp,
_rsp);

proghdr_buf = frag->fragcurrent;

__socket_proto_init_pending (priv, default_read_size);

frag->call_body.reply.accepted_success_state
= SP_STATE_READING_PROC_HEADER;

/* fall through */

case SP_STATE_READING_PROC_HEADER:
__socket_proto_read (priv, ret);


By this time we've read read response _minus_ the xdata


I meant we have read "readv response header"



/* there can be 'xdata' in read response, figure it out */
xdrmem_create (, proghdr_buf, default_read_size,
   XDR_DECODE);


We created xdr stream above with "default_read_size" (this doesn't
include xdata)


/* This will fail if there is xdata sent from server, if not,
   well and good, we don't need to worry about  */


what if xdata is present and decoding failed (as length of xdr stream
above - default_read_size - doesn't include xdata)? would we have a
valid value in read_rsp.xdata.xdata_len? This is the part I am
confused about. If read_rsp.xdata.xdata_len is not correct then there
is a possibility that 

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Raghavendra Gowdappa


- Original Message -
> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> To: "Xavier Hernandez" <xhernan...@datalab.es>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Friday, April 29, 2016 12:36:43 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> 
> 
> - Original Message -
> > From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> > To: "Xavier Hernandez" <xhernan...@datalab.es>
> > Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
> > <gluster-devel@gluster.org>
> > Sent: Friday, April 29, 2016 12:07:59 PM
> > Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> > 
> > 
> > 
> > - Original Message -
> > > From: "Xavier Hernandez" <xhernan...@datalab.es>
> > > To: "Jeff Darcy" <jda...@redhat.com>
> > > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > > Sent: Thursday, April 28, 2016 8:15:36 PM
> > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> > > 
> > > 
> > > 
> > > Hi Jeff,
> > > 
> > > On 28.04.2016 15:20, Jeff Darcy wrote:
> > > 
> > > 
> > > 
> > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> > > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
> > > problem
> > > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
> > > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
> > > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
> > > the
> > > read test. As can be seen in the data below, client3_3_readv_cbk() is
> > > processing an iovec of 116 bytes, however it should be of 154 bytes (the
> > > buffer in memory really seems to contain 154 bytes). The data on the
> > > network
> > > seems ok (at least I haven't been able to identify any problem), so this
> > > must be a processing error on the client side. The last field in cut
> > > buffer
> > > of the sequentialized data corresponds to the length of the xdata field:
> > > 0x26. So at least 38 more byte should be present.
> > > Nice detective work, Xavi.  It would be *very* interesting to see what
> > > the value of the "count" parameter is (it's unfortunately optimized out).
> > > I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
> > > some problems with how this iov is put together, a couple of years ago,
> > > and it looks like you might have tripped over one more.
> > > It seems you are right. The count is 2 and the first 38 bytes of the
> > > second
> > > vector contains the remaining data of xdata field.
> > 
> > This is the bug. client3_3_readv_cbk (and for that matter all the
> > actors/cbks) expects response in utmost two vectors:
> > 1. Program header containing request or response. This is subjected to
> > decoding/encoding. This vector should point to a buffer that contains the
> > entire program header/response contiguously.
> > 2. If the procedure returns payload (like readv response or a write
> > request),
> > second vector contains the buffer pointing to the entire (contiguous)
> > payload. Note that this payload is raw and is not subjected to
> > encoding/decoding.
> > 
> > In your case, this _clean_ separation is broken with part of program header
> > slipping into 2nd vector supposed to contain read data (may be because of
> > rpc fragmentation). I think this is a bug in socket layer. I'll update more
> > on this.
> 
> Does your read response include xdata too? I think the code related to
> reading xdata in readv response is a bit murky.
> 
> 
> 
> case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
> default_read_size = xdr_sizeof ((xdrproc_t)
> xdr_gfs3_read_rsp,
> _rsp);
> 
> proghdr_buf = frag->fragcurrent;
> 
> __socket_proto_init_pending (priv, default_read_size);
> 
> frag->call_body.reply.accepted_success_state
> = SP_STATE_READING_PROC_HEADER;
> 
> /* fall through */
> 
> case SP_STATE_READING_PROC_HEADER:
> __socket_proto_read (priv, ret);
> 
> >>>>> By this time we've read read response _minus_ the xdata

I

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Raghavendra Gowdappa


- Original Message -
> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> To: "Xavier Hernandez" <xhernan...@datalab.es>
> Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel" 
> <gluster-devel@gluster.org>
> Sent: Friday, April 29, 2016 12:07:59 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> 
> 
> - Original Message -
> > From: "Xavier Hernandez" <xhernan...@datalab.es>
> > To: "Jeff Darcy" <jda...@redhat.com>
> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > Sent: Thursday, April 28, 2016 8:15:36 PM
> > Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> > 
> > 
> > 
> > Hi Jeff,
> > 
> > On 28.04.2016 15:20, Jeff Darcy wrote:
> > 
> > 
> > 
> > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem
> > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
> > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
> > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
> > the
> > read test. As can be seen in the data below, client3_3_readv_cbk() is
> > processing an iovec of 116 bytes, however it should be of 154 bytes (the
> > buffer in memory really seems to contain 154 bytes). The data on the
> > network
> > seems ok (at least I haven't been able to identify any problem), so this
> > must be a processing error on the client side. The last field in cut buffer
> > of the sequentialized data corresponds to the length of the xdata field:
> > 0x26. So at least 38 more byte should be present.
> > Nice detective work, Xavi.  It would be *very* interesting to see what
> > the value of the "count" parameter is (it's unfortunately optimized out).
> > I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
> > some problems with how this iov is put together, a couple of years ago,
> > and it looks like you might have tripped over one more.
> > It seems you are right. The count is 2 and the first 38 bytes of the second
> > vector contains the remaining data of xdata field.
> 
> This is the bug. client3_3_readv_cbk (and for that matter all the
> actors/cbks) expects response in utmost two vectors:
> 1. Program header containing request or response. This is subjected to
> decoding/encoding. This vector should point to a buffer that contains the
> entire program header/response contiguously.
> 2. If the procedure returns payload (like readv response or a write request),
> second vector contains the buffer pointing to the entire (contiguous)
> payload. Note that this payload is raw and is not subjected to
> encoding/decoding.
> 
> In your case, this _clean_ separation is broken with part of program header
> slipping into 2nd vector supposed to contain read data (may be because of
> rpc fragmentation). I think this is a bug in socket layer. I'll update more
> on this.

Does your read response include xdata too? I think the code related to reading 
xdata in readv response is a bit murky.



case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp,
_rsp);

proghdr_buf = frag->fragcurrent;

__socket_proto_init_pending (priv, default_read_size);

frag->call_body.reply.accepted_success_state
= SP_STATE_READING_PROC_HEADER;

/* fall through */

case SP_STATE_READING_PROC_HEADER:
__socket_proto_read (priv, ret);

>>>>> By this time we've read read response _minus_ the xdata

/* there can be 'xdata' in read response, figure it out */
xdrmem_create (, proghdr_buf, default_read_size,
   XDR_DECODE);

>>>>>> We created xdr stream above with "default_read_size" (this doesn't 
>>>>>> include xdata)

/* This will fail if there is xdata sent from server, if not,   


   well and good, we don't need to worry about  */

>>>>>> what if xdata is present and decoding failed (as length of xdr stream 
>>>>>> above - default_read_size - doesn't include xdata)? would we have a 
>>&g

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-29 Thread Raghavendra Gowdappa


- Original Message -
> From: "Xavier Hernandez" <xhernan...@datalab.es>
> To: "Jeff Darcy" <jda...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Thursday, April 28, 2016 8:15:36 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> 
> 
> Hi Jeff,
> 
> On 28.04.2016 15:20, Jeff Darcy wrote:
> 
> 
> 
> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem
> easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the
> read test. As can be seen in the data below, client3_3_readv_cbk() is
> processing an iovec of 116 bytes, however it should be of 154 bytes (the
> buffer in memory really seems to contain 154 bytes). The data on the network
> seems ok (at least I haven't been able to identify any problem), so this
> must be a processing error on the client side. The last field in cut buffer
> of the sequentialized data corresponds to the length of the xdata field:
> 0x26. So at least 38 more byte should be present.
> Nice detective work, Xavi.  It would be *very* interesting to see what
> the value of the "count" parameter is (it's unfortunately optimized out).
> I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
> some problems with how this iov is put together, a couple of years ago,
> and it looks like you might have tripped over one more.
> It seems you are right. The count is 2 and the first 38 bytes of the second
> vector contains the remaining data of xdata field. 

This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) 
expects response in utmost two vectors:
1. Program header containing request or response. This is subjected to 
decoding/encoding. This vector should point to a buffer that contains the 
entire program header/response contiguously.
2. If the procedure returns payload (like readv response or a write request), 
second vector contains the buffer pointing to the entire (contiguous) payload. 
Note that this payload is raw and is not subjected to encoding/decoding.

In your case, this _clean_ separation is broken with part of program header 
slipping into 2nd vector supposed to contain read data (may be because of rpc 
fragmentation). I think this is a bug in socket layer. I'll update more on this.

> The rest of the data in
> the second vector seems the payload of the readv fop, plus a 2 bytes
> padding:
> (gdb) f 0
> #0  client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c,
> count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
> 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL,
> (gdb) print *iov
> $2 = {iov_base = 0x7fdc14b0d018, iov_len = 116}
> (gdb) f 1
> #1  0x7fdc56dafab0 in rpc_clnt_handle_reply
> (clnt=clnt@entry=0x7fdc3c1f4bb0, pollin=pollin@entry=0x7fdc34010f20) at
> rpc-clnt.c:764
> 764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame);
> (gdb) print *pollin
> $3 = {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base =
> 0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0}  times>}, count = 2,
> vectored = 1 '\001', private = 0x7fdc340106c0, iobref = 0x7fdc34006660,
> hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1 '\001'}
> (gdb) f 0
> #0  client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c,
> count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
> 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL,
> (gdb) print iov[1]
> $4 = {iov_base = 0x7fdc14a4d000, iov_len = 32808}
> (gdb) print iov[2]
> $5 = {iov_base = 0x2, iov_len = 140583741974112}
> (gdb) x/128xb 0x7fdc14a4d000
> 0x7fdc14a4d000: 0x000x000x000x010x000x000x000x17
> 0x7fdc14a4d008: 0x000x000x000x020x670x6c0x750x73
> 0x7fdc14a4d010: 0x740x650x720x660x730x2e0x690x6e
> 0x7fdc14a4d018: 0x6f0x640x650x6c0x6b0x2d0x630x6f
> 0x7fdc14a4d020: 0x750x6e0x740x000x310x000x000x00
> 0x7fdc14a4d028: 0x5c0x5c0x5c0x5c0x5c0x5c0x5c0x5c
> 0x7fdc14a4d030: 0x000x000x000x000x000x000x000x00
> 0x7fdc14a4d038: 0x000x000x000x000x000x000x000x00
> 0x7fdc14a4d040: 0x000x000x000x000x000x000x000x00
> 0x7fdc14a4d048: 0x5c0x000x000x000x000x000x000x00
> 0x7fdc14a4d050: 0x000x000x000x000x000x000x000x00
> 0x7f

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-28 Thread Raghavendra Gowdappa
Thanks xavi.

@Prasanna/Rafi/Rtalur,
We had earlier seen "decoding failed" errors in quotad in another user setup. 
Possibly these are related. Can you please take a look?

regards,
Raghavendra

- Original Message -
> From: "Xavier Hernandez" 
> To: "Gluster Devel" 
> Sent: Thursday, April 28, 2016 6:13:01 PM
> Subject: [Gluster-devel] Possible bug in the communications layer ?
> 
> Hi,
> 
> I've seen what seems a bug in the communications layer. The first sign
> is an "XDR decoding failed" error in the logs.
> 
> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> volume is a distributed-disperse 4*(4+2).
> 
> I'm able to reproduce the problem easily doing the following test:
> 
> iozone -t2 -s10g -r1024k -i0 -w -F /iozone{1..2}.dat
> echo 3 >/proc/sys/vm/drop_caches
> iozone -t2 -s10g -r1024k -i1 -w -F /iozone{1..2}.dat
> 
> The error happens soon after starting the read test.
> 
> As can be seen in the data below, client3_3_readv_cbk() is processing an
> iovec of 116 bytes, however it should be of 154 bytes (the buffer in
> memory really seems to contain 154 bytes). The data on the network seems
> ok (at least I haven't been able to identify any problem), so this must
> be a processing error on the client side.
> 
> The last field in cut buffer of the sequentialized data corresponds to
> the length of the xdata field: 0x26. So at least 38 more byte should be
> present.
> 
> My guess is that some corner case is hit reading fragmented network
> packets due to a high load.
> 
> Debug information:
> 
> Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c,
> iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at
> client-rpc-fops.c:3021
> 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL,
> (gdb) print *iov
> $1 = {iov_base = 0x7f53e994e018, iov_len = 116}
> (gdb) x/116xb 0x7f53e994e018
> 0x7f53e994e018: 0x000x000x800x000x000x000x000x00
> 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e
> 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e038: 0x000x000x000x000x000x000x080x00
> 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01
> 0x7f53e994e048: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e050: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00
> 0x7f53e994e060: 0x000x000x100x000x000x000x000x00
> 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f
> 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3
> 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3
> 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00
> 0x7f53e994e088: 0x000x000x000x26
> (gdb) x/154xb 0x7f53e994e018
> 0x7f53e994e018: 0x000x000x800x000x000x000x000x00
> 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e
> 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e038: 0x000x000x000x000x000x000x080x00
> 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01
> 0x7f53e994e048: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e050: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00
> 0x7f53e994e060: 0x000x000x100x000x000x000x000x00
> 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f
> 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3
> 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3
> 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00
> 0x7f53e994e088: 0x000x000x000x260x000x000x000x01
> 0x7f53e994e090: 0x000x000x000x170x000x000x000x02
> 0x7f53e994e098: 0x670x6c0x750x730x740x650x720x66
> 0x7f53e994e0a0: 0x730x2e0x690x6e0x6f0x640x650x6c
> 0x7f53e994e0a8: 0x6b0x2d0x630x6f0x750x6e0x740x00
> 0x7f53e994e0b0: 0x310x00
> (gdb) bt
> #0  client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac,
> count=, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021
> #1  0x7f542a677ab0 in rpc_clnt_handle_reply
> (clnt=clnt@entry=0x7f54101cdef0, pollin=pollin@entry=0x7f5491f0) at
> rpc-clnt.c:764
> #2  0x7f542a677d6f in rpc_clnt_notify (trans=,
> mydata=0x7f54101cdf20, event=, 

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-28 Thread Xavier Hernandez
 

I've filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1331502
and added Raghavendra Gowdappa in the CC list (he appears as a
maintainer of RPC). 

Xavi 

On 28.04.2016 18:42, Xavier Hernandez
wrote: 

> Hi Niels, 
> 
> On 28.04.2016 15:44, Niels de Vos wrote: 
>

>> On Thu, Apr 28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote:
>>

>>> Hi, I've seen what seems a bug in the communications layer. The
first sign is an "XDR decoding failed" error in the logs. This happens
with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a
distributed-disperse 4*(4+2). I'm able to reproduce the problem easily
doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
the read test.
>> 
>> Do you know if this only happens on disperse
volumes, or also with
>> others?
> 
> I have only seen the problem with
EC. Not sure if it's because I do most of my tests with EC or because
other volumes do not manifest this problem. However I've only seen this
when I started testing Ganesha. I have never seen the problem with
FUSE.
> 
> I think the reason is that FUSE is slower than Ganesha
(little more than 120 MB/s vs 400 MB/s) and the combination of events
needed to cause this problem would be much more unlikely to happen on
FUSE. Since EC also talks to many bricks simultaneously (6 in this
case), maybe this makes it more sensible to communications problems
compared to a replicated volume.
> 
>> If you have captured a network
trace, could you provide it to
>> me? You can use 'editcap -s0 ...' to
copy only the relevant packets.
>> But, I dont have an issue to download
a few GB either if that is easier
>> for you.
>> 
>>> As can be seen in
the data below, client3_3_readv_cbk() is processing an iovec of 116
bytes, however it should be of 154 bytes (the buffer in memory really
seems to contain 154 bytes). The data on the network seems ok (at least
I haven't been able to identify any problem), so this must be a
processing error on the client side. The last field in cut buffer of the
sequentialized data corresponds to the length of the xdata field: 0x26.
So at least 38 more byte should be present. My guess is that some corner
case is hit reading fragmented network packets due to a high load. Debug
information: Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c,
iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at
client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov $1 = {iov_base = 0x7f53e994e018, iov_len = 116} (gdb)
x/116xb 0x7f53e994e018 0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00
0x00 0x00 0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e
>> 
>>
Hmm, I'm not sure how this is layed out in memory. 0x80 would be one
of
>> first bytes in RPC payload, it signals 'last record' for the
RPC
>> procedure, and we only send one record anyway. The four bytes
combined
>> like (0x80 0x.. 0x.. 0x..) should be (0x80 |
rpc-record-size). Reading
>> this in Wireshark from a .pcap.gz is much
easier :)
> 
> The RPC header is already decoded here. The 116 bytes are
only the content of the readv answer, as decoded by xdr_gfs3_read_rsp()
function.
> 
> This means that the first 4 bytes are the op_ret, next 4
are op_errno, followed by an encoded iatt (with gfid and ino number as
the first fields). Then a size field and the length of a stream of bytes
(corresponding to the encoded xdata).
> 
> I've a network capture. I can
upload it if you want, but I think this is what you are trying to see:
>

> (gdb) f 1
> #1 0x7fa88cb5bab0 in rpc_clnt_handle_reply
(clnt=clnt@entry=0x7fa870413e30, pollin=pollin@entry=0x7fa86400efe0) at
rpc-clnt.c:764
> 764 req->cbkfn (req, req->rsp, req->rspcnt,
saved_frame->frame);
> (gdb) print *pollin
> $1 = {vector = {{iov_base =
0x7fa7ef58, iov_len = 140}, {iov_base = 0x7fa7ef46, iov_len =
32808}, {iov_base = 0x0, iov_len = 0} }, count = 2,
> vectored = 1 '01',
private = 0x7fa86c00ec60, iobref = 0x7fa86400e560, hdr_iobuf =
0x7fa868022ee0, is_reply = 1 '01'}
> (gdb) x/140xb 0x7fa7ef58
>
0x7fa7ef58: 0x00 0x00 0x89 0x68 0x00 0x00 0x00 0x01
>
0x7fa7ef580008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580020: 0xc3 0x8e 0x35 0xf0 0x31 0xa1 0x45 0x01
>
0x7fa7ef580028: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf
>
0x7fa7ef580030: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf
>
0x7fa7ef580038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00
>
0x7fa7ef580040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00 0x01
>
0x7fa7ef580048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00
>
0x7fa7ef580060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580068: 0x00 0x50 0x00 0x00 0x57 0x22 0x33 0x1f
>
0x7fa7ef580070: 0x24 0x0c 0xc2 

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-28 Thread Xavier Hernandez
 

Hi Niels, 

On 28.04.2016 15:44, Niels de Vos wrote: 

> On Thu, Apr
28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote:
> 
>> Hi, I've
seen what seems a bug in the communications layer. The first sign is an
"XDR decoding failed" error in the logs. This happens with Gluster
3.7.11 accessed through Ganesha and gfapi. The volume is a
distributed-disperse 4*(4+2). I'm able to reproduce the problem easily
doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
the read test.
> 
> Do you know if this only happens on disperse
volumes, or also with
> others?

I have only seen the problem with EC.
Not sure if it's because I do most of my tests with EC or because other
volumes do not manifest this problem. However I've only seen this when I
started testing Ganesha. I have never seen the problem with FUSE.

I
think the reason is that FUSE is slower than Ganesha (little more than
120 MB/s vs 400 MB/s) and the combination of events needed to cause this
problem would be much more unlikely to happen on FUSE. Since EC also
talks to many bricks simultaneously (6 in this case), maybe this makes
it more sensible to communications problems compared to a replicated
volume.

> If you have captured a network trace, could you provide it
to
> me? You can use 'editcap -s0 ...' to copy only the relevant
packets.
> But, I dont have an issue to download a few GB either if that
is easier
> for you.
> 
>> As can be seen in the data below,
client3_3_readv_cbk() is processing an iovec of 116 bytes, however it
should be of 154 bytes (the buffer in memory really seems to contain 154
bytes). The data on the network seems ok (at least I haven't been able
to identify any problem), so this must be a processing error on the
client side. The last field in cut buffer of the sequentialized data
corresponds to the length of the xdata field: 0x26. So at least 38 more
byte should be present. My guess is that some corner case is hit reading
fragmented network packets due to a high load. Debug information:
Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c,
iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at
client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov $1 = {iov_base = 0x7f53e994e018, iov_len = 116} (gdb)
x/116xb 0x7f53e994e018 0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00
0x00 0x00 0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e
> 
>
Hmm, I'm not sure how this is layed out in memory. 0x80 would be one
of
> first bytes in RPC payload, it signals 'last record' for the RPC
>
procedure, and we only send one record anyway. The four bytes combined
>
like (0x80 0x.. 0x.. 0x..) should be (0x80 | rpc-record-size). Reading
>
this in Wireshark from a .pcap.gz is much easier :)

The RPC header is
already decoded here. The 116 bytes are only the content of the readv
answer, as decoded by xdr_gfs3_read_rsp() function.

This means that the
first 4 bytes are the op_ret, next 4 are op_errno, followed by an
encoded iatt (with gfid and ino number as the first fields). Then a size
field and the length of a stream of bytes (corresponding to the encoded
xdata).

I've a network capture. I can upload it if you want, but I
think this is what you are trying to see:

(gdb) f 1
#1
0x7fa88cb5bab0 in rpc_clnt_handle_reply
(clnt=clnt@entry=0x7fa870413e30, pollin=pollin@entry=0x7fa86400efe0) at
rpc-clnt.c:764
764 req->cbkfn (req, req->rsp, req->rspcnt,
saved_frame->frame);
(gdb) print *pollin
$1 = {vector = {{iov_base =
0x7fa7ef58, iov_len = 140}, {iov_base = 0x7fa7ef46, iov_len =
32808}, {iov_base = 0x0, iov_len = 0} }, count = 2,

vectored = 1 '01', private = 0x7fa86c00ec60, iobref = 0x7fa86400e560,
hdr_iobuf = 0x7fa868022ee0, is_reply = 1 '01'}
(gdb) x/140xb
0x7fa7ef58
0x7fa7ef58: 0x00 0x00 0x89 0x68 0x00 0x00 0x00
0x01
0x7fa7ef580008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fa7ef580010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fa7ef580018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00
0x00
0x7fa7ef580020: 0xc3 0x8e 0x35 0xf0 0x31 0xa1 0x45
0x01
0x7fa7ef580028: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59
0xdf
0x7fa7ef580030: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59
0xdf
0x7fa7ef580038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08
0x00
0x7fa7ef580040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00
0x01
0x7fa7ef580048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fa7ef580050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fa7ef580058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00
0x00
0x7fa7ef580060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00
0x00
0x7fa7ef580068: 0x00 0x50 0x00 0x00 0x57 0x22 0x33
0x1f
0x7fa7ef580070: 0x24 0x0c 0xc2 0x58 0x57 0x22 0x33
0xa1
0x7fa7ef580078: 0x34 0x11 0x3b 0x54 0x57 0x22 0x33
0xa1
0x7fa7ef580080: 0x34 0x20 0x7d 0x7a 0x00 0x00 0x80
0x00
0x7fa7ef580088: 0x00 0x00 0x00 0x26

(gdb) f 3
#3
0x7fa88cb57853 in rpc_transport_notify
(this=this@entry=0x7fa870423ab0,

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-28 Thread Xavier Hernandez
 

Hi Jeff, 

On 28.04.2016 15:20, Jeff Darcy wrote: 

>> This happens
with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a
distributed-disperse 4*(4+2). I'm able to reproduce the problem easily
doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
the read test. As can be seen in the data below, client3_3_readv_cbk()
is processing an iovec of 116 bytes, however it should be of 154 bytes
(the buffer in memory really seems to contain 154 bytes). The data on
the network seems ok (at least I haven't been able to identify any
problem), so this must be a processing error on the client side. The
last field in cut buffer of the sequentialized data corresponds to the
length of the xdata field: 0x26. So at least 38 more byte should be
present.
> 
> Nice detective work, Xavi. It would be *very* interesting
to see what
> the value of the "count" parameter is (it's unfortunately
optimized out).
> I'll bet it's two, and iov[1].iov_len is 38. I have a
weak memory of
> some problems with how this iov is put together, a
couple of years ago,
> and it looks like you might have tripped over one
more.

It seems you are right. The count is 2 and the first 38 bytes of
the second vector contains the remaining data of xdata field. The rest
of the data in the second vector seems the payload of the readv fop,
plus a 2 bytes padding:

(gdb) f 0
#0 client3_3_readv_cbk
(req=0x7fdc4051a31c, iov=0x7fdc4051a35c, count=,
myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
3021 gf_msg
(this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov
$2 = {iov_base =
0x7fdc14b0d018, iov_len = 116}
(gdb) f 1
#1 0x7fdc56dafab0 in
rpc_clnt_handle_reply (clnt=clnt@entry=0x7fdc3c1f4bb0,
pollin=pollin@entry=0x7fdc34010f20) at rpc-clnt.c:764
764 req->cbkfn
(req, req->rsp, req->rspcnt, saved_frame->frame);
(gdb) print *pollin
$3
= {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base =
0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} }, count = 2,
 vectored = 1 '01', private = 0x7fdc340106c0,
iobref = 0x7fdc34006660, hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1
'01'}
(gdb) f 0
#0 client3_3_readv_cbk (req=0x7fdc4051a31c,
iov=0x7fdc4051a35c, count=, myframe=0x7fdc520d505c) at
client-rpc-fops.c:3021
3021 gf_msg (this->name, GF_LOG_ERROR,
EINVAL,
(gdb) print iov[1]
$4 = {iov_base = 0x7fdc14a4d000, iov_len =
32808}
(gdb) print iov[2]
$5 = {iov_base = 0x2, iov_len =
140583741974112}
(gdb) x/128xb 0x7fdc14a4d000
0x7fdc14a4d000: 0x00 0x00
0x00 0x01 0x00 0x00 0x00 0x17
0x7fdc14a4d008: 0x00 0x00 0x00 0x02 0x67
0x6c 0x75 0x73
0x7fdc14a4d010: 0x74 0x65 0x72 0x66 0x73 0x2e 0x69
0x6e
0x7fdc14a4d018: 0x6f 0x64 0x65 0x6c 0x6b 0x2d 0x63
0x6f
0x7fdc14a4d020: 0x75 0x6e 0x74 0x00 0x31 0x00 0x00
0x00
0x7fdc14a4d028: 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c
0x5c
0x7fdc14a4d030: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d040: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d048: 0x5c 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d058: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d060: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d068: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d070: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d078: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

> Maybe
it's related to all that epoll stuff.

I'm currently using 4 epoll
threads (this improves ec performance). I'll try to repeat the tests
with a single epoll thread, but I'm not sure if this will be enough to
get any conclusion if the problem doesn't manifest, since ec through
fuse with 4 epoll threads doesn't seem to trigger the problem.

Xavi
 ___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-28 Thread Niels de Vos
On Thu, Apr 28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote:
> Hi,
> 
> I've seen what seems a bug in the communications layer. The first sign is an
> "XDR decoding failed" error in the logs.
> 
> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> volume is a distributed-disperse 4*(4+2).
> 
> I'm able to reproduce the problem easily doing the following test:
> 
> iozone -t2 -s10g -r1024k -i0 -w -F /iozone{1..2}.dat
> echo 3 >/proc/sys/vm/drop_caches
> iozone -t2 -s10g -r1024k -i1 -w -F /iozone{1..2}.dat
> 
> The error happens soon after starting the read test.

Do you know if this only happens on disperse volumes, or also with
others? If you have captured a network trace, could you provide it to
me? You can use 'editcap -s0 ...' to copy only the relevant packets.
But, I dont have an issue to download a few GB either if that is easier
for you.

> As can be seen in the data below, client3_3_readv_cbk() is processing an
> iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory
> really seems to contain 154 bytes). The data on the network seems ok (at
> least I haven't been able to identify any problem), so this must be a
> processing error on the client side.
> 
> The last field in cut buffer of the sequentialized data corresponds to the
> length of the xdata field: 0x26. So at least 38 more byte should be present.
> 
> My guess is that some corner case is hit reading fragmented network packets
> due to a high load.
> 
> Debug information:
> 
> Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac,
> count=, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021
> 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL,
> (gdb) print *iov
> $1 = {iov_base = 0x7f53e994e018, iov_len = 116}
> (gdb) x/116xb 0x7f53e994e018
> 0x7f53e994e018: 0x000x000x800x000x000x000x000x00
> 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e

Hmm, I'm not sure how this is layed out in memory. 0x80 would be one of
first bytes in RPC payload, it signals 'last record' for the RPC
procedure, and we only send one record anyway. The four bytes combined
like (0x80 0x.. 0x.. 0x..) should be (0x80 | rpc-record-size). Reading
this in Wireshark from a .pcap.gz is much easier :)

Don't hesitate to file a bug for this, and include the exact error
messages you get. If possible with network capture, but if you can not
share that publicly, please get it directly to me.

Thanks,
Niels


> 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e038: 0x000x000x000x000x000x000x080x00
> 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01
> 0x7f53e994e048: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e050: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00
> 0x7f53e994e060: 0x000x000x100x000x000x000x000x00
> 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f
> 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3
> 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3
> 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00
> 0x7f53e994e088: 0x000x000x000x26
> (gdb) x/154xb 0x7f53e994e018
> 0x7f53e994e018: 0x000x000x800x000x000x000x000x00
> 0x7f53e994e020: 0xa80xbf0xa30xe00x5f0x480x4c0x1e
> 0x7f53e994e028: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e030: 0x800xa30x8a0xd80x9d0xa10x1c0x75
> 0x7f53e994e038: 0x000x000x000x000x000x000x080x00
> 0x7f53e994e040: 0x000x000x810xa00x000x000x000x01
> 0x7f53e994e048: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e050: 0x000x000x000x000x000x000x000x00
> 0x7f53e994e058: 0x000x000x000x000xa00x000x000x00
> 0x7f53e994e060: 0x000x000x100x000x000x000x000x00
> 0x7f53e994e068: 0x000x500x000x000x570x220x040x1f
> 0x7f53e994e070: 0x250x380x920x910x570x220x040xb3
> 0x7f53e994e078: 0x030x530x1b0x130x570x220x040xb3
> 0x7f53e994e080: 0x060xf50xe10x990x000x000x800x00
> 0x7f53e994e088: 0x000x000x000x260x000x000x000x01
> 0x7f53e994e090: 0x000x000x000x170x000x000x000x02
> 0x7f53e994e098: 0x670x6c0x750x730x740x650x720x66
> 0x7f53e994e0a0: 0x730x2e0x690x6e0x6f0x640x650x6c
> 0x7f53e994e0a8: 0x6b0x2d

Re: [Gluster-devel] Possible bug in the communications layer ?

2016-04-28 Thread Jeff Darcy
> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> volume is a distributed-disperse 4*(4+2).
> 
> I'm able to reproduce the problem easily doing the following test:
> 
> iozone -t2 -s10g -r1024k -i0 -w -F /iozone{1..2}.dat
> echo 3 >/proc/sys/vm/drop_caches
> iozone -t2 -s10g -r1024k -i1 -w -F /iozone{1..2}.dat
> 
> The error happens soon after starting the read test.
> 
> As can be seen in the data below, client3_3_readv_cbk() is processing an
> iovec of 116 bytes, however it should be of 154 bytes (the buffer in
> memory really seems to contain 154 bytes). The data on the network seems
> ok (at least I haven't been able to identify any problem), so this must
> be a processing error on the client side.
> 
> The last field in cut buffer of the sequentialized data corresponds to
> the length of the xdata field: 0x26. So at least 38 more byte should be
> present.


Nice detective work, Xavi.  It would be *very* interesting to see what
the value of the "count" parameter is (it's unfortunately optimized out).
I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
some problems with how this iov is put together, a couple of years ago,
and it looks like you might have tripped over one more.

 Maybe it's related to all that epoll stuff. 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel