Thanks, I've opened a proposal and taken Gian's suggestion into account. https://github.com/apache/incubator-druid/issues/7180
On Fri, Mar 1, 2019 at 4:20 PM Gian Merlino <g...@apache.org> wrote: > To me this seems like a lot of effort to go through just to detect cases > where servers from two different clusters are misconfigured to read each > others' files or talk to each other by accident. I wonder if there's an > easier way to do it. Maybe keep the cluster name idea, but write it to a > marker file in any local storage directories that servers read on > bootstrap, and don't load up from them if the name is wrong? > > On Fri, Mar 1, 2019 at 6:17 PM David Glasser <glas...@apollographql.com> > wrote: > > > Makes sense. > > > > To elaborate a bit more on my "cluster name" concept, I actually think it > > would be pretty straightforward: > > > > - Add something like `druid.cluster.name=staging`. > > - To be compatible with existing data, also add something like > > `druid.cluster.allowSegmentsFromClusters=["", "dev"]`. Note that the > empty > > string is explicitly recognized here. > > - Add a `clusterName` field to DataSegment. When creating a new segment, > > set its clusterName field to the value of druid.cluster.name. > > - Make various places that see DataSegments ignore and warn when > presented > > with segments whose cluster does not match druid.cluster.name or a value > > in > > druid.cluster.allowSegmentsFromClusters. This would include > > SegmentLoadDropHandler (which is what looks at the local cache in > > historicals etc), operations that publish new segments, etc. > > > > This might actually be simpler and more efficient than going to the > > database each time, though the database approach could handle other > related > > issues I suppose. > > > > On Fri, Mar 1, 2019 at 1:58 PM Jihoon Son <ghoon...@gmail.com> wrote: > > > > > The broker learns from historicals and tasks even though recently a PR > > has > > > been merged to keep published segments in memory ( > > > https://github.com/apache/incubator-druid/pull/6901) in brokers. > > > Probably it makes sense to filter out segments in brokers too if they > are > > > from historicals and not in the metadata store. > > > > > > Jihoon > > > > > > On Fri, Mar 1, 2019 at 1:24 PM David Glasser < > glas...@apollographql.com> > > > wrote: > > > > > > > That makes sense. Does the coordinator's decisions about what > segments > > > are > > > > 'used' affect the broker's choices for routing queries, or does it > just > > > > learn about things directly from historicals/ingestion tasks (via... > > > > zookeeper?) > > > > > > > > --dave > > > > > > > > On Fri, Mar 1, 2019 at 1:15 PM Jihoon Son <ghoon...@gmail.com> > wrote: > > > > > > > > > Hi Dave, > > > > > > > > > > I think the third option sounds most reasonable to fix this issue. > > > Though > > > > > the second option sounds useful in general. > > > > > And yes, it wouldn't be easy to refuse to announce unknown segments > > in > > > > > historicals. > > > > > I think it makes more sense to check only in the coordinator > because > > > it's > > > > > the only node who would directly access to the metadata store > (except > > > > > overlord). > > > > > So, the coordinator may not update the "used" flag if overshadowing > > > > > segments are not in the metadata store. > > > > > In stream ingestion, segments might not be in the metadata store > > until > > > > they > > > > > are published. However, this shouldn't be a problem because > segments > > > are > > > > > always appended in stream ingestion. > > > > > > > > > > Jihoon > > > > > > > > > > On Fri, Mar 1, 2019 at 12:49 AM David Glasser < > > > glas...@apollographql.com > > > > > > > > > > wrote: > > > > > > > > > > > (I sent this message to druid-user last week and got no response. > > > Since > > > > > it > > > > > > is proposing making improvements to Druid, I thought maybe it > would > > > be > > > > > > appropriate to resend here. Hope that's OK.) > > > > > > > > > > > > We had a big outage in our Druid cluster last week. We run our > > Druid > > > > > > servers in Kubernetes, and our historicals use machine local SSDs > > for > > > > > their > > > > > > segment caches. We made the unfortunate choice to have our > > > production > > > > > and > > > > > > staging historicals share the same pool of machines, and today > got > > > bit > > > > by > > > > > > this for the first time. > > > > > > > > > > > > A production historical started up on a machine whose segment > cache > > > > > > contained segments from our staging cluster. Our prod and > staging > > > > > clusters > > > > > > use the same names for data sources. > > > > > > > > > > > > This meant that these segments overshadowed production segments > > which > > > > > > happened to have lower versions. Worse, when > > > > > > DruidCoordinatorCleanupOvershadowed kicked in, all of the > > production > > > > > > segments that were overshadowed got used=false set, and quickly > got > > > > > dropped > > > > > > from historicals. This ended up being the majority of our data. > We > > > > > > eventually figured out what was going on and did a bunch of > manual > > > > steps > > > > > to > > > > > > clean up (turning off and clearing the cache of the two > historicals > > > > that > > > > > > had staging segments on them, manually setting used=true for all > > > > entries > > > > > in > > > > > > druid_segments, waiting a long long time for data to > re-download), > > > but > > > > > > figuring out what was going on was subtle (I was very lucky I had > > > > > randomly > > > > > > decided to read a lot of the code about how the `used` column > works > > > and > > > > > how > > > > > > versioned timelines are calculated just a few days before!). > > > > > > > > > > > > (We were also lucky that we had turned off coordinator automatic > > > > killing > > > > > > literally that morning!) > > > > > > > > > > > > I feel like Druid should have been able to protect me from this > to > > > some > > > > > > degree. (Yes, we are going to address the root cause by making it > > > > > > impossible for prod and staging to reuse each others' disks.) > Some > > > > > thoughts > > > > > > on changes that could have helped: > > > > > > > > > > > > - Is the Druid standard to prepend the "cluster" name to the data > > > > source > > > > > > name, so that conflicts like this are never possible? We are > > > certainly > > > > > > tempted to do this now but nobody ever told us to. If that's the > > > > > standard, > > > > > > should it be documented? > > > > > > > > > > > > - Should clusters have an optional name/namespace, and > DataSegments > > > > have > > > > > > that namespace recorded in it, and clusters refuse to handle > > segments > > > > > they > > > > > > find that are from a different namespace? This would be like the > > > common > > > > > > database setup where a single server/cluster has a set of > database > > > > which > > > > > > each have a set of tables. > > > > > > > > > > > > - Should historicals refuse to announce segments that don't exist > > in > > > > the > > > > > > druid_segments table, or should coordinators/brokers/etc refuse > to > > > pay > > > > > > attention to segments announced *by historicals* that don't exist > > in > > > > the > > > > > > druid_segments table. I'm going to guess this is difficult to do > > in > > > > the > > > > > > historical because the historical probably doesn't actually talk > to > > > the > > > > > sql > > > > > > DB at all? But maybe it could be done by coordinator and broker? > > > > > > > > > > > > --dave > > > > > > > > > > > > > > > > > > > > >