Re: [ceph-users] 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-03-16 Thread Gregory Farnum
On Thu, Mar 16, 2017 at 3:36 AM 许雪寒  wrote:

> Hi, Gregory, is it possible to unlock Connection::lock in
> Pipe::read_message before tcp_read_nonblocking is called? I checked the
> code again, it seems that the code in tcp_read_nonblocking doesn't need to
> be locked by Connection::lock.


Unfortunately it does. You'll note the memory buffers it's grabbing via the
Connection? Those need to be protected from changing (either being
canceled, or being set up) while the read is being processed.
Now, you could probably do something more complicated around the buffer
update mechanism, or if you know your applications don't make use of it you
could just rip them out entirely. But while that mechanism exists it needs
to be synchronized.
-Greg



>
> -邮件原件-
> 发件人: Gregory Farnum [mailto:gfar...@redhat.com]
> 发送时间: 2017年1月17日 7:14
> 收件人: 许雪寒
> 抄送: jiajia zhong; ceph-users@lists.ceph.com
> 主题: Re: 答复: [ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5
>
> On Sat, Jan 14, 2017 at 7:54 PM, 许雪寒  wrote:
> > Thanks for your help:-)
> >
> > I checked the source code again, and in read_message, it does hold the
> Connection::lock:
>
> You're correct of course; I wasn't looking and forgot about this bit.
> This was added to deal with client-allocated buffers and/or op
> cancellation in librados, IIRC, and unfortunately definitely does need to
> be synchronized — I'm not sure about with pipe lookups, but probably even
> that. :/
>
> Unfortunately it looks like you're running a version that didn't come from
> upstream (I see hash 81d4ad40d0c2a4b73529ff0db3c8f22acd15c398 in another
> email, which I can't find), so there's not much we can do to help with the
> specifics of this case — it's fiddly and my guess would be the same as
> Sage's, which you say is not the case.
> -Greg
>
> >
> >  while (left > 0) {
> > // wait for data
> > if (tcp_read_wait() < 0)
> > goto out_dethrottle;
> >
> > // get a buffer
> > connection_state->lock.Lock();
> > map
> >::iterator p =
> >
>  connection_state->rx_buffers.find(header.tid);
> > if (p != connection_state->rx_buffers.end()) {
> > if (rxbuf.length() == 0 ||
> p->second.second != rxbuf_version) {
> > ldout(msgr->cct,10)
> > <<
> "reader seleting rx buffer v "
> >
><< p->second.second << " at offset "
> >
><< offset << " len "
> >
><< p->second.first.length() << dendl;
> > rxbuf = p->second.first;
> > rxbuf_version = p->second.second;
> > // make sure it's big enough
> > if (rxbuf.length() < data_len)
> > rxbuf.push_back(
> >
>  buffer::create(data_len - rxbuf.length()));
> > blp = p->second.first.begin();
> > blp.advance(offset);
> > }
> > } else {
> > if (!newbuf.length()) {
> > ldout(msgr->cct,20)
> > <<
> "reader allocating new rx buffer at offset "
> >
><< offset << dendl;
> > alloc_aligned_buffer(newbuf,
> data_len, data_off);
> > blp = newbuf.begin();
> > blp.advance(offset);
> > }
> > }
> > bufferptr bp = blp.get_current_ptr();
> > int read = MIN(bp.length(), left);
> > ldout(msgr->cct,20)
> > << "reader reading
> nonblocking into "
> > <<
> (void*) bp.c_str() << " len " << bp.length()
> > << dendl;
> > int got = tcp_read_nonblocking(bp.c_str(), read);
> > ldout(msgr->cct,30)
> > << "reader read " << got
> << " of " << read << dendl;
> > connection_state->lock.Unlock();
> > if (got < 0)
> > goto out_dethrottle;
> > if (got > 0) {
> > blp.advance(got);
> > data.append(bp, 0, got);
> > offset += got;
> > lef

Re: [ceph-users] 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-01-16 Thread Gregory Farnum
On Sat, Jan 14, 2017 at 7:54 PM, 许雪寒  wrote:
> Thanks for your help:-)
>
> I checked the source code again, and in read_message, it does hold the 
> Connection::lock:

You're correct of course; I wasn't looking and forgot about this bit.
This was added to deal with client-allocated buffers and/or op
cancellation in librados, IIRC, and unfortunately definitely does need
to be synchronized — I'm not sure about with pipe lookups, but
probably even that. :/

Unfortunately it looks like you're running a version that didn't come
from upstream (I see hash 81d4ad40d0c2a4b73529ff0db3c8f22acd15c398 in
another email, which I can't find), so there's not much we can do to
help with the specifics of this case — it's fiddly and my guess would
be the same as Sage's, which you say is not the case.
-Greg

>
>  while (left > 0) {
> // wait for data
> if (tcp_read_wait() < 0)
> goto out_dethrottle;
>
> // get a buffer
> connection_state->lock.Lock();
> map >::iterator p =
> 
> connection_state->rx_buffers.find(header.tid);
> if (p != connection_state->rx_buffers.end()) {
> if (rxbuf.length() == 0 || p->second.second 
> != rxbuf_version) {
> ldout(msgr->cct,10)
> << "reader 
> seleting rx buffer v "
>   
>   << p->second.second << " at offset "
>   
>   << offset << " len "
>   
>   << p->second.first.length() << dendl;
> rxbuf = p->second.first;
> rxbuf_version = p->second.second;
> // make sure it's big enough
> if (rxbuf.length() < data_len)
> rxbuf.push_back(
> 
> buffer::create(data_len - rxbuf.length()));
> blp = p->second.first.begin();
> blp.advance(offset);
> }
> } else {
> if (!newbuf.length()) {
> ldout(msgr->cct,20)
> << "reader 
> allocating new rx buffer at offset "
>   
>   << offset << dendl;
> alloc_aligned_buffer(newbuf, 
> data_len, data_off);
> blp = newbuf.begin();
> blp.advance(offset);
> }
> }
> bufferptr bp = blp.get_current_ptr();
> int read = MIN(bp.length(), left);
> ldout(msgr->cct,20)
> << "reader reading 
> nonblocking into "
> << (void*) 
> bp.c_str() << " len " << bp.length()
> << dendl;
> int got = tcp_read_nonblocking(bp.c_str(), read);
> ldout(msgr->cct,30)
> << "reader read " << got << " 
> of " << read << dendl;
> connection_state->lock.Unlock();
> if (got < 0)
> goto out_dethrottle;
> if (got > 0) {
> blp.advance(got);
> data.append(bp, 0, got);
> offset += got;
> left -= got;
> } // else we got a signal or something; just loop.
> }
>
> As shown in the above code, in the reading loop, it first lock 
> connection_state->lock and then do tcp_read_nonblocking. connection_state is 
> of type PipeConnectionRef, connection_state->lock is Connection::lock.
>
> On the other hand, I'll check that whether there are a lot of message to send 
> as you suggested. Thanks:-)
>
>
>
> 发件人: Gregory Farnum [gfar...@redhat.com]
>
> 发送时间: 2017年1月14日 9:39
>
> 收件人: 许雪寒
>
> Cc: jiajia zhong; ceph-users@lists.ceph.com
>
> 主题: Re: [ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5
>
>
>
>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒  

[ceph-users] 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-01-14 Thread 许雪寒
Thanks for your help:-)

I checked the source code again, and in read_message, it does hold the 
Connection::lock:

 while (left > 0) {
// wait for data
if (tcp_read_wait() < 0)
goto out_dethrottle;

// get a buffer
connection_state->lock.Lock();
map >::iterator p =

connection_state->rx_buffers.find(header.tid);
if (p != connection_state->rx_buffers.end()) {
if (rxbuf.length() == 0 || p->second.second != 
rxbuf_version) {
ldout(msgr->cct,10)
<< "reader 
seleting rx buffer v "

<< p->second.second << " at offset "

<< offset << " len "

<< p->second.first.length() << dendl;
rxbuf = p->second.first;
rxbuf_version = p->second.second;
// make sure it's big enough
if (rxbuf.length() < data_len)
rxbuf.push_back(

buffer::create(data_len - rxbuf.length()));
blp = p->second.first.begin();
blp.advance(offset);
}
} else {
if (!newbuf.length()) {
ldout(msgr->cct,20)
<< "reader 
allocating new rx buffer at offset "

<< offset << dendl;
alloc_aligned_buffer(newbuf, data_len, 
data_off);
blp = newbuf.begin();
blp.advance(offset);
}
}
bufferptr bp = blp.get_current_ptr();
int read = MIN(bp.length(), left);
ldout(msgr->cct,20)
<< "reader reading nonblocking 
into "
<< (void*) 
bp.c_str() << " len " << bp.length()
<< dendl;
int got = tcp_read_nonblocking(bp.c_str(), read);
ldout(msgr->cct,30)
<< "reader read " << got << " 
of " << read << dendl;
connection_state->lock.Unlock();
if (got < 0)
goto out_dethrottle;
if (got > 0) {
blp.advance(got);
data.append(bp, 0, got);
offset += got;
left -= got;
} // else we got a signal or something; just loop.
}

As shown in the above code, in the reading loop, it first lock 
connection_state->lock and then do tcp_read_nonblocking. connection_state is of 
type PipeConnectionRef, connection_state->lock is Connection::lock.

On the other hand, I'll check that whether there are a lot of message to send 
as you suggested. Thanks:-)



发件人: Gregory Farnum [gfar...@redhat.com]

发送时间: 2017年1月14日 9:39

收件人: 许雪寒

Cc: jiajia zhong; ceph-users@lists.ceph.com

主题: Re: [ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5









On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒  wrote:










Thank you for your continuous helpJ.
 
We are using hammer 0.94.5 version, and what I read is the version of the 
source code.

However, on the other hand, if Pipe::do_recv do act as blocked, is it 
reasonable for the Pipe::reader_thread to block threads calling 
SimpleMessenger::submit_message
 by holding Connection::lock?
 
I think maybe a different mutex should be used in Pipe::read_message rather 
than Connection::lock.







I don't think it does use that lock. Pipe::read_message() is generally called 
while the pipe_lock is held, but not Connection::lock. (They are separate.) 

I haven't dug into the relevant OSD code in a while, but I think it's a lot 
more likely your OSD is just overloaded and is taking a while to send a lot of 
different messages, and that the loop it's in doesn't update the HeartbeatMap 
or something.