On Thu, 25 Jul 2013, Gregory Farnum wrote:
> On Thu, Jul 25, 2013 at 4:28 PM, Sage Weil <s...@inktank.com> wrote:
> > On Thu, 25 Jul 2013, Gregory Farnum wrote:
> >> On Thu, Jul 25, 2013 at 4:01 PM, Sage Weil <s...@inktank.com> wrote:
> >> > I've added a blueprint for avoiding double-writes when using btrfs:
> >> >
> >> >         
> >> > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_clone_from_journal_on_btrfs
> >> >
> >> > This should improve throughput significantly when the journal is a file 
> >> > in
> >> > btrfs.
> >> >
> >> > ---
> >> >
> >> > Also, there's one for improving the localized read behavior:
> >> >
> >> >         
> >> > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/librados%2F%2Fobjecter%3A_smarter_localized_reads
> >> >
> >> > For example, for read-only parents of rbd clones, we may as well read 
> >> > from
> >> > the replica in the same host or rack or row--whatever crush can tell
> >> > us--and not the primary.  This is good for locality and load distribution
> >> > when certain object sets are hot.
> >>
> >> This blueprint includes work items to set locality information in
> >> libcephfs and via the Hadoop bindings. However, there's still a read
> >> hole issue with read-from-replicas [1] that makes this generally
> >> unwise. Did you consider that when writing this blueprint?
> >> In particular I think we want to discuss if we allow people to use a
> >> more powerful read-from-replica unless we can guarantee their usage of
> >> it is safe (ie, snapshots).
> >
> > Yeah, there's an open bug for that, but the solution doesn't seem
> > interesting enough to warrant a CDS discussion...
> >
> >         http://tracker.ceph.com/issues/5388
> >
> > But if I'm wrong, by all means write one! :)
> 
> I didn't think we had a solution yet, since your last words there are
> "the fix on the OSD is going to be a bit more involved". :p
> That doesn't mean we shouldn't do this, I just thought it was a
> problem that needed to be part of the blueprint when designing and
> implementing this, whether it's the user's problem to handle properly,
> or we want to lock it out in ways we can be reasonably sure are safe,
> or if we expect the local read issue to be resolved before this is
> completed.

I'm assuming it's a matter of using the ObjectContexts on the replicas, 
but perhaps not.  In any case, I'm operating on the assumption that this 
is a bug that must be resolved before the smarter localized reads are 
usable, but that the bug isn't interesting enough to discuss.  If you 
disagree, write or update the blueprint :)

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to