To me this seems like a lot of effort to go through just to detect cases where servers from two different clusters are misconfigured to read each others' files or talk to each other by accident. I wonder if there's an easier way to do it. Maybe keep the cluster name idea, but write it to a marker file in any local storage directories that servers read on bootstrap, and don't load up from them if the name is wrong?
On Fri, Mar 1, 2019 at 6:17 PM David Glasser <glas...@apollographql.com> wrote: > Makes sense. > > To elaborate a bit more on my "cluster name" concept, I actually think it > would be pretty straightforward: > > - Add something like `druid.cluster.name=staging`. > - To be compatible with existing data, also add something like > `druid.cluster.allowSegmentsFromClusters=["", "dev"]`. Note that the empty > string is explicitly recognized here. > - Add a `clusterName` field to DataSegment. When creating a new segment, > set its clusterName field to the value of druid.cluster.name. > - Make various places that see DataSegments ignore and warn when presented > with segments whose cluster does not match druid.cluster.name or a value > in > druid.cluster.allowSegmentsFromClusters. This would include > SegmentLoadDropHandler (which is what looks at the local cache in > historicals etc), operations that publish new segments, etc. > > This might actually be simpler and more efficient than going to the > database each time, though the database approach could handle other related > issues I suppose. > > On Fri, Mar 1, 2019 at 1:58 PM Jihoon Son <ghoon...@gmail.com> wrote: > > > The broker learns from historicals and tasks even though recently a PR > has > > been merged to keep published segments in memory ( > > https://github.com/apache/incubator-druid/pull/6901) in brokers. > > Probably it makes sense to filter out segments in brokers too if they are > > from historicals and not in the metadata store. > > > > Jihoon > > > > On Fri, Mar 1, 2019 at 1:24 PM David Glasser <glas...@apollographql.com> > > wrote: > > > > > That makes sense. Does the coordinator's decisions about what segments > > are > > > 'used' affect the broker's choices for routing queries, or does it just > > > learn about things directly from historicals/ingestion tasks (via... > > > zookeeper?) > > > > > > --dave > > > > > > On Fri, Mar 1, 2019 at 1:15 PM Jihoon Son <ghoon...@gmail.com> wrote: > > > > > > > Hi Dave, > > > > > > > > I think the third option sounds most reasonable to fix this issue. > > Though > > > > the second option sounds useful in general. > > > > And yes, it wouldn't be easy to refuse to announce unknown segments > in > > > > historicals. > > > > I think it makes more sense to check only in the coordinator because > > it's > > > > the only node who would directly access to the metadata store (except > > > > overlord). > > > > So, the coordinator may not update the "used" flag if overshadowing > > > > segments are not in the metadata store. > > > > In stream ingestion, segments might not be in the metadata store > until > > > they > > > > are published. However, this shouldn't be a problem because segments > > are > > > > always appended in stream ingestion. > > > > > > > > Jihoon > > > > > > > > On Fri, Mar 1, 2019 at 12:49 AM David Glasser < > > glas...@apollographql.com > > > > > > > > wrote: > > > > > > > > > (I sent this message to druid-user last week and got no response. > > Since > > > > it > > > > > is proposing making improvements to Druid, I thought maybe it would > > be > > > > > appropriate to resend here. Hope that's OK.) > > > > > > > > > > We had a big outage in our Druid cluster last week. We run our > Druid > > > > > servers in Kubernetes, and our historicals use machine local SSDs > for > > > > their > > > > > segment caches. We made the unfortunate choice to have our > > production > > > > and > > > > > staging historicals share the same pool of machines, and today got > > bit > > > by > > > > > this for the first time. > > > > > > > > > > A production historical started up on a machine whose segment cache > > > > > contained segments from our staging cluster. Our prod and staging > > > > clusters > > > > > use the same names for data sources. > > > > > > > > > > This meant that these segments overshadowed production segments > which > > > > > happened to have lower versions. Worse, when > > > > > DruidCoordinatorCleanupOvershadowed kicked in, all of the > production > > > > > segments that were overshadowed got used=false set, and quickly got > > > > dropped > > > > > from historicals. This ended up being the majority of our data. We > > > > > eventually figured out what was going on and did a bunch of manual > > > steps > > > > to > > > > > clean up (turning off and clearing the cache of the two historicals > > > that > > > > > had staging segments on them, manually setting used=true for all > > > entries > > > > in > > > > > druid_segments, waiting a long long time for data to re-download), > > but > > > > > figuring out what was going on was subtle (I was very lucky I had > > > > randomly > > > > > decided to read a lot of the code about how the `used` column works > > and > > > > how > > > > > versioned timelines are calculated just a few days before!). > > > > > > > > > > (We were also lucky that we had turned off coordinator automatic > > > killing > > > > > literally that morning!) > > > > > > > > > > I feel like Druid should have been able to protect me from this to > > some > > > > > degree. (Yes, we are going to address the root cause by making it > > > > > impossible for prod and staging to reuse each others' disks.) Some > > > > thoughts > > > > > on changes that could have helped: > > > > > > > > > > - Is the Druid standard to prepend the "cluster" name to the data > > > source > > > > > name, so that conflicts like this are never possible? We are > > certainly > > > > > tempted to do this now but nobody ever told us to. If that's the > > > > standard, > > > > > should it be documented? > > > > > > > > > > - Should clusters have an optional name/namespace, and DataSegments > > > have > > > > > that namespace recorded in it, and clusters refuse to handle > segments > > > > they > > > > > find that are from a different namespace? This would be like the > > common > > > > > database setup where a single server/cluster has a set of database > > > which > > > > > each have a set of tables. > > > > > > > > > > - Should historicals refuse to announce segments that don't exist > in > > > the > > > > > druid_segments table, or should coordinators/brokers/etc refuse to > > pay > > > > > attention to segments announced *by historicals* that don't exist > in > > > the > > > > > druid_segments table. I'm going to guess this is difficult to do > in > > > the > > > > > historical because the historical probably doesn't actually talk to > > the > > > > sql > > > > > DB at all? But maybe it could be done by coordinator and broker? > > > > > > > > > > --dave > > > > > > > > > > > > > > >