I think I've found the problem.

1567         case SP_STATE_READING_PROC_HEADER:
 1568                 __socket_proto_read (priv, ret);
 1569
1570 /* there can be 'xdata' in read response, figure it out */
 1571                 xdrmem_create (&xdr, proghdr_buf, default_read_size,
 1572                                XDR_DECODE);
 1573
1574 /* This will fail if there is xdata sent from server, if not,
 1575                    well and good, we don't need to worry about  */
 1576                 xdr_gfs3_read_rsp (&xdr, &read_rsp);
 1577
 1578                 free (read_rsp.xdata.xdata_val);
 1579
1580 /* need to round off to proper roof (%4), as XDR packing pads
 1581                    the end of opaque object with '0' */
 1582                 size = roof (read_rsp.xdata.xdata_len, 4);
 1583
 1584                 if (!size) {
 1585                         frag->call_body.reply.accepted_success_state
 1586                                 = SP_STATE_READ_PROC_OPAQUE;
 1587                         goto read_proc_opaque;
 1588                 }
 1589
 1590                 __socket_proto_init_pending (priv, size);
 1591
 1592                 frag->call_body.reply.accepted_success_state
 1593                         = SP_STATE_READING_PROC_OPAQUE;
 1594
 1595         case SP_STATE_READING_PROC_OPAQUE:
 1596                 __socket_proto_read (priv, ret);
 1597
 1598                 frag->call_body.reply.accepted_success_state
 1599                         = SP_STATE_READ_PROC_OPAQUE;

On line 1568 we read, at most, 116 bytes because we calculate the size of a read response without xdata. Then we detect that we really need more data for xdata (BTW, read_rsp.xdata.xdata_val will be always allocated even if xdr_gfs3_read_rsp() fails ?)

So we get into line 1596 with the pending info initialized to read the remaining data. This is the __socket_proto_read macro:

166 /* This will be used in a switch case and breaks from the switch case if all
  167  * the pending data is not read.
  168  */
169 #define __socket_proto_read(priv, ret) \ 170 { \ 171 size_t bytes_read = 0; \ 172 struct gf_sock_incoming *in; \ 173 in = &priv->incoming; \ 174 \ 175 __socket_proto_update_pending (priv); \ 176 \ 177 ret = __socket_readv (this, \ 178 in->pending_vector, 1, \ 179 &in->pending_vector, \ 180 &in->pending_count, \ 181 &bytes_read); \ 182 if (ret == -1) \ 183 break; \ 184 __socket_proto_update_priv_after_read (priv, ret, bytes_read); \
  185         }

We read from the socket using __socket_readv(). It it fails, we quit. However if the socket doesn't have more data to read, this function does not return -1:

555 ret = __socket_cached_read (this, opvector, opcount);
  556
  557                         if (ret == 0) {
558 gf_log(this->name,GF_LOG_DEBUG,"EOF on socket");
  559                                 errno = ENODATA;
  560                                 ret = -1;
  561                         }
  562                         if (ret == -1 && errno == EAGAIN) {
  563                                 /* done for now */
  564                                 break;
  565                         }
  566                         this->total_bytes_read += ret;

If __socket_cached_read() fails with errno == EAGAIN, we break and return opcount, which is >= 0. Causing the process to continue instead of waiting for more data.

As a side note, there's another problem here: if errno is not EAGAIN, we'll update this->total_bytes_read substracting one. This shouldn't be done when ret < 0.

There are other places where ret is set to -1, but opcount is returned. I guess that we should also set opcount = -1 on these places, but I don't have a deep knowledge about this implementation.

I've done a quick test checking for (ret != 0) instead of (ret == -1) in __socket_proto_read() and it seemed to work.

Could anyone with more knowledge about the communications layer verify this and explain what would be the best solution ?

Xavi

On 29/04/16 14:52, Xavier Hernandez wrote:
With your patch applied, it seems that the bug is not hit.

I guess it's a timing issue that the new logging hides. Maybe no more
data available after reading the partial readv header ? (it will arrive
later).

I'll continue testing...

Xavi

On 29/04/16 13:48, Raghavendra Gowdappa wrote:
Attaching the patch.

----- Original Message -----
From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 5:14:02 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?



----- Original Message -----
From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 1:21:57 PM
Subject: Re: [Gluster-devel] Possible bug in the communications layer ?

Hi Raghavendra,

yes, the readv response contains xdata. The dict length is 38 (0x26)
and, at the moment of failure, rsp.xdata.xdata_len already contains
0x26.

rsp.xdata.xdata_len having 0x26 even when decoding failed indicates
that the
approach used in socket.c to get the length of xdata is correct.
However, I
cannot find any other way of xdata going into payload vector other than
xdata_len being zero. Just to be double sure, I've a patch containing
debug
message printing xdata_len when decoding fails in socket.c. Can you
please
apply the patch, run the tests and revert back with results?


Xavi

On 29/04/16 09:10, Raghavendra Gowdappa wrote:


----- Original Message -----
From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:36:43 PM
Subject: Re: [Gluster-devel] Possible bug in the communications
layer ?



----- Original Message -----
From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Xavier Hernandez" <xhernan...@datalab.es>
Cc: "Jeff Darcy" <jda...@redhat.com>, "Gluster Devel"
<gluster-devel@gluster.org>
Sent: Friday, April 29, 2016 12:07:59 PM
Subject: Re: [Gluster-devel] Possible bug in the communications
layer ?



----- Original Message -----
From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Jeff Darcy" <jda...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, April 28, 2016 8:15:36 PM
Subject: Re: [Gluster-devel] Possible bug in the communications
layer
?



Hi Jeff,

On 28.04.2016 15:20, Jeff Darcy wrote:



This happens with Gluster 3.7.11 accessed through Ganesha and
gfapi.
The
volume is a distributed-disperse 4*(4+2). I'm able to reproduce the
problem
easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2
-s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after
starting
the
read test. As can be seen in the data below,
client3_3_readv_cbk() is
processing an iovec of 116 bytes, however it should be of 154 bytes
(the
buffer in memory really seems to contain 154 bytes). The data on
the
network
seems ok (at least I haven't been able to identify any problem), so
this
must be a processing error on the client side. The last field in
cut
buffer
of the sequentialized data corresponds to the length of the xdata
field:
0x26. So at least 38 more byte should be present.
Nice detective work, Xavi.  It would be *very* interesting to
see what
the value of the "count" parameter is (it's unfortunately optimized
out).
I'll bet it's two, and iov[1].iov_len is 38.  I have a weak
memory of
some problems with how this iov is put together, a couple of years
ago,
and it looks like you might have tripped over one more.
It seems you are right. The count is 2 and the first 38 bytes of
the
second
vector contains the remaining data of xdata field.

This is the bug. client3_3_readv_cbk (and for that matter all the
actors/cbks) expects response in utmost two vectors:
1. Program header containing request or response. This is
subjected to
decoding/encoding. This vector should point to a buffer that
contains
the
entire program header/response contiguously.
2. If the procedure returns payload (like readv response or a write
request),
second vector contains the buffer pointing to the entire
(contiguous)
payload. Note that this payload is raw and is not subjected to
encoding/decoding.

In your case, this _clean_ separation is broken with part of program
header
slipping into 2nd vector supposed to contain read data (may be
because
of
rpc fragmentation). I think this is a bug in socket layer. I'll
update
more
on this.

Does your read response include xdata too? I think the code
related to
reading xdata in readv response is a bit murky.

<socket.c/__socket_read_accepted_successful_reply>

        case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
                default_read_size = xdr_sizeof ((xdrproc_t)
                xdr_gfs3_read_rsp,
                                                &read_rsp);

                proghdr_buf = frag->fragcurrent;

                __socket_proto_init_pending (priv,
default_read_size);

                frag->call_body.reply.accepted_success_state
                        = SP_STATE_READING_PROC_HEADER;

                /* fall through */

        case SP_STATE_READING_PROC_HEADER:
                __socket_proto_read (priv, ret);

By this time we've read read response _minus_ the xdata

I meant we have read "readv response header"


                /* there can be 'xdata' in read response, figure
it out
                */
                xdrmem_create (&xdr, proghdr_buf, default_read_size,
                               XDR_DECODE);

We created xdr stream above with "default_read_size" (this
doesn't
include xdata)

                /* This will fail if there is xdata sent from
server, if
                not,
                   well and good, we don't need to worry about  */

what if xdata is present and decoding failed (as length of xdr
stream
above - default_read_size - doesn't include xdata)? would we
have
a
valid value in read_rsp.xdata.xdata_len? This is the part I am
confused about. If read_rsp.xdata.xdata_len is not correct then
there
is a possibility that xdata might not be entirely present in
the
vector socket passes to higher layers as progheader (with
part or
entire xdata spilling over to payload vector).

                xdr_gfs3_read_rsp (&xdr, &read_rsp);

                free (read_rsp.xdata.xdata_val);

                /* need to round off to proper roof (%4), as XDR
packing
                pads
                   the end of opaque object with '0' */
                size = roof (read_rsp.xdata.xdata_len, 4);

                if (!size) {
                        frag->call_body.reply.accepted_success_state
                                = SP_STATE_READ_PROC_OPAQUE;
                        goto read_proc_opaque;
                }

</socket.c>

Can you please confirm whether there was an xdata in the readv
response
(may
be by looking in bricks) whose decoding failed?

regards,
Raghavendra
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to