Re: Namespacing segments, or preventing unknown segments from wreaking havoc

David Glasser Fri, 01 Mar 2019 16:37:44 -0800

Thanks, I've opened a proposal and taken Gian's suggestion into account.
https://github.com/apache/incubator-druid/issues/7180


On Fri, Mar 1, 2019 at 4:20 PM Gian Merlino <g...@apache.org> wrote:

> To me this seems like a lot of effort to go through just to detect cases
> where servers from two different clusters are misconfigured to read each
> others' files or talk to each other by accident. I wonder if there's an
> easier way to do it. Maybe keep the cluster name idea, but write it to a
> marker file in any local storage directories that servers read on
> bootstrap, and don't load up from them if the name is wrong?
>
> On Fri, Mar 1, 2019 at 6:17 PM David Glasser <glas...@apollographql.com>
> wrote:
>
> > Makes sense.
> >
> > To elaborate a bit more on my "cluster name" concept, I actually think it
> > would be pretty straightforward:
> >
> > - Add something like `druid.cluster.name=staging`.
> > - To be compatible with existing data, also add something like
> > `druid.cluster.allowSegmentsFromClusters=["", "dev"]`. Note that the
> empty
> > string is explicitly recognized here.
> > - Add a `clusterName` field to DataSegment. When creating a new segment,
> > set its clusterName field to the value of druid.cluster.name.
> > - Make various places that see DataSegments ignore and warn when
> presented
> > with segments whose cluster does not match druid.cluster.name or a value
> > in
> > druid.cluster.allowSegmentsFromClusters. This would include
> > SegmentLoadDropHandler (which is what looks at the local cache in
> > historicals etc), operations that publish new segments, etc.
> >
> > This might actually be simpler and more efficient than going to the
> > database each time, though the database approach could handle other
> related
> > issues I suppose.
> >
> > On Fri, Mar 1, 2019 at 1:58 PM Jihoon Son <ghoon...@gmail.com> wrote:
> >
> > > The broker learns from historicals and tasks even though recently a PR
> > has
> > > been merged to keep published segments in memory (
> > > https://github.com/apache/incubator-druid/pull/6901) in brokers.
> > > Probably it makes sense to filter out segments in brokers too if they
> are
> > > from historicals and not in the metadata store.
> > >
> > > Jihoon
> > >
> > > On Fri, Mar 1, 2019 at 1:24 PM David Glasser <
> glas...@apollographql.com>
> > > wrote:
> > >
> > > > That makes sense. Does the coordinator's decisions about what
> segments
> > > are
> > > > 'used' affect the broker's choices for routing queries, or does it
> just
> > > > learn about things directly from historicals/ingestion tasks (via...
> > > > zookeeper?)
> > > >
> > > > --dave
> > > >
> > > > On Fri, Mar 1, 2019 at 1:15 PM Jihoon Son <ghoon...@gmail.com>
> wrote:
> > > >
> > > > > Hi Dave,
> > > > >
> > > > > I think the third option sounds most reasonable to fix this issue.
> > > Though
> > > > > the second option sounds useful in general.
> > > > > And yes, it wouldn't be easy to refuse to announce unknown segments
> > in
> > > > > historicals.
> > > > > I think it makes more sense to check only in the coordinator
> because
> > > it's
> > > > > the only node who would directly access to the metadata store
> (except
> > > > > overlord).
> > > > > So, the coordinator may not update the "used" flag if overshadowing
> > > > > segments are not in the metadata store.
> > > > > In stream ingestion, segments might not be in the metadata store
> > until
> > > > they
> > > > > are published. However, this shouldn't be a problem because
> segments
> > > are
> > > > > always appended in stream ingestion.
> > > > >
> > > > > Jihoon
> > > > >
> > > > > On Fri, Mar 1, 2019 at 12:49 AM David Glasser <
> > > glas...@apollographql.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > (I sent this message to druid-user last week and got no response.
> > > Since
> > > > > it
> > > > > > is proposing making improvements to Druid, I thought maybe it
> would
> > > be
> > > > > > appropriate to resend here. Hope that's OK.)
> > > > > >
> > > > > > We had a big outage in our Druid cluster last week.  We run our
> > Druid
> > > > > > servers in Kubernetes, and our historicals use machine local SSDs
> > for
> > > > > their
> > > > > > segment caches.  We made the unfortunate choice to have our
> > > production
> > > > > and
> > > > > > staging historicals share the same pool of machines, and today
> got
> > > bit
> > > > by
> > > > > > this for the first time.
> > > > > >
> > > > > > A production historical started up on a machine whose segment
> cache
> > > > > > contained segments from our staging cluster.  Our prod and
> staging
> > > > > clusters
> > > > > > use the same names for data sources.
> > > > > >
> > > > > > This meant that these segments overshadowed production segments
> > which
> > > > > > happened to have lower versions.  Worse, when
> > > > > > DruidCoordinatorCleanupOvershadowed kicked in, all of the
> > production
> > > > > > segments that were overshadowed got used=false set, and quickly
> got
> > > > > dropped
> > > > > > from historicals. This ended up being the majority of our data.
> We
> > > > > > eventually figured out what was going on and did a bunch of
> manual
> > > > steps
> > > > > to
> > > > > > clean up (turning off and clearing the cache of the two
> historicals
> > > > that
> > > > > > had staging segments on them, manually setting used=true for all
> > > > entries
> > > > > in
> > > > > > druid_segments, waiting a long long time for data to
> re-download),
> > > but
> > > > > > figuring out what was going on was subtle (I was very lucky I had
> > > > > randomly
> > > > > > decided to read a lot of the code about how the `used` column
> works
> > > and
> > > > > how
> > > > > > versioned timelines are calculated just a few days before!).
> > > > > >
> > > > > > (We were also lucky that we had turned off coordinator automatic
> > > > killing
> > > > > > literally that morning!)
> > > > > >
> > > > > > I feel like Druid should have been able to protect me from this
> to
> > > some
> > > > > > degree. (Yes, we are going to address the root cause by making it
> > > > > > impossible for prod and staging to reuse each others' disks.)
> Some
> > > > > thoughts
> > > > > > on changes that could have helped:
> > > > > >
> > > > > > - Is the Druid standard to prepend the "cluster" name to the data
> > > > source
> > > > > > name, so that conflicts like this are never possible?  We are
> > > certainly
> > > > > > tempted to do this now but nobody ever told us to. If that's the
> > > > > standard,
> > > > > > should it be documented?
> > > > > >
> > > > > > - Should clusters have an optional name/namespace, and
> DataSegments
> > > > have
> > > > > > that namespace recorded in it, and clusters refuse to handle
> > segments
> > > > > they
> > > > > > find that are from a different namespace? This would be like the
> > > common
> > > > > > database setup where a single server/cluster has a set of
> database
> > > > which
> > > > > > each have a set of tables.
> > > > > >
> > > > > > - Should historicals refuse to announce segments that don't exist
> > in
> > > > the
> > > > > > druid_segments table, or should coordinators/brokers/etc refuse
> to
> > > pay
> > > > > > attention to segments announced *by historicals* that don't exist
> > in
> > > > the
> > > > > > druid_segments table.  I'm going to guess this is difficult to do
> > in
> > > > the
> > > > > > historical because the historical probably doesn't actually talk
> to
> > > the
> > > > > sql
> > > > > > DB at all? But maybe it could be done by coordinator and broker?
> > > > > >
> > > > > > --dave
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Namespacing segments, or preventing unknown segments from wreaking havoc

Reply via email to