Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if 
that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like 
> > this: when running with cache tieing, there are 'slow request' warning 
> > messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow requests, 1 
> > included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 
> > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 
> > 3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally 
> > figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the 
> > data from base tier osd, the cache tier primary osd replicates the data to 
> > other cache tier osds. Some times this takes quite a long time. During this 
> > period of time, the promoted object may be evicted because the cache tier 
> > is full. When the primary osd finally gets the replication response and 
> > restarts the original read request, it doesn't find the object in the cache 
> > tier, and do promotion again. This loops for several times, and we'll see 
> > the 'slow request' in the logs. Theoretically, this could loops forever, 
> > and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. 
> > It's set to true after the copy of data from base tier and before the 
> > replication. It's reset to false after the replication and the original 
> > client request starts to execute. Evicting is not allowed when this field 
> > is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to