I can understand keeping PFS for historical/compatibility reasons, but if
gossip is broken I think you will have similar ring view problems during
replace/bootstrap that would still occur with the use of PFS (such as
missing tokens, since those are propagated via gossip), so that doesn't
seem like a strong reason to keep it around.

With PFS it's pretty easy to shoot yourself in the foot if you're not
careful enough to have identical files across nodes and updating it when
adding nodes/dcs, so it's seems to be less foolproof than other snitches.
While the rejection of verbs to invalid replicas on trunk could address
concerns raised by Jeremy, this would only happen after the new node joins
the ring, so you would need to re-bootstrap the node and lose all the work
done in the original bootstrap.

Perhaps one good reason to use PFS is the ability to easily package it
across multiple nodes, as pointed out by Sean Durity on CASSANDRA-10745
(which is also it's Achilles' heel). To keep this ability, we could make
GPFS compatible with the cassandra-topology.properties file, but reading
only the dc/rack info about the local node.

Em seg, 22 de out de 2018 às 16:58, sankalp kohli <kohlisank...@gmail.com>
escreveu:

> Yes it will happen. I am worried that same way DC or rack info can go
> missing.
>
> On Mon, Oct 22, 2018 at 12:52 PM Paulo Motta <pauloricard...@gmail.com>
> wrote:
>
> > > the new host won’t learn about the host whose status is missing and the
> > view of this host will be wrong.
> >
> > Won't this happen even with PropertyFileSnitch as the token(s) for this
> > host will be missing from gossip/system.peers?
> >
> > Em sáb, 20 de out de 2018 às 00:34, Sankalp Kohli <
> kohlisank...@gmail.com>
> > escreveu:
> >
> > > Say you restarted all instances in the cluster and status for some host
> > > goes missing. Now when you start a host replacement, the new host won’t
> > > learn about the host whose status is missing and the view of this host
> > will
> > > be wrong.
> > >
> > > PS: I will be happy to be proved wrong as I can also start using Gossip
> > > snitch :)
> > >
> > > > On Oct 19, 2018, at 2:41 PM, Jeremy Hanna <
> jeremy.hanna1...@gmail.com>
> > > wrote:
> > > >
> > > > Do you mean to say that during host replacement there may be a time
> > when
> > > the old->new host isn’t fully propagated and therefore wouldn’t yet be
> in
> > > all system tables?
> > > >
> > > >> On Oct 17, 2018, at 4:20 PM, sankalp kohli <kohlisank...@gmail.com>
> > > wrote:
> > > >>
> > > >> This is not the case during host replacement correct?
> > > >>
> > > >> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan <
> > > >> jeremiah.jor...@gmail.com> wrote:
> > > >>
> > > >>> As long as we are correctly storing such things in the system
> tables
> > > and
> > > >>> reading them out of the system tables when we do not have the
> > > information
> > > >>> from gossip yet, it should not be a problem. (As far as I know GPFS
> > > does
> > > >>> this, but I have not done extensive code diving or testing to make
> > > sure all
> > > >>> edge cases are covered there)
> > > >>>
> > > >>> -Jeremiah
> > > >>>
> > > >>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli <
> kohlisank...@gmail.com
> > >
> > > >>> wrote:
> > > >>>>
> > > >>>> Will GossipingPropertyFileSnitch not be vulnerable to Gossip bugs
> > > where
> > > >>> we
> > > >>>> lose hostId or some other fields when we restart C* for large
> > > >>>> clusters(~1000 instances)?
> > > >>>>
> > > >>>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa <jji...@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>> We should, but the 4.0 features that log/reject verbs to invalid
> > > >>> replicas
> > > >>>>> solves a lot of the concerns here
> > > >>>>>
> > > >>>>> --
> > > >>>>> Jeff Jirsa
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna <
> > > jeremy.hanna1...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>> We have had PropertyFileSnitch for a long time even though
> > > >>>>> GossipingPropertyFileSnitch is effectively a superset of what it
> > > offers
> > > >>> and
> > > >>>>> is much less error prone.  There are some unexpected behaviors
> when
> > > >>> things
> > > >>>>> aren’t configured correctly with PFS.  For example, if you
> replace
> > > >>> nodes in
> > > >>>>> one DC and add those nodes to that DCs property files and not the
> > > other
> > > >>> DCs
> > > >>>>> property files - the resulting problems aren’t very
> straightforward
> > > to
> > > >>>>> troubleshoot.
> > > >>>>>>
> > > >>>>>> We could try to improve the resilience and fail fast error
> > checking
> > > and
> > > >>>>> error reporting of PFS, but honestly, why wouldn’t we deprecate
> and
> > > >>> remove
> > > >>>>> PropertyFileSnitch?  Are there reasons why GPFS wouldn’t be
> > > sufficient
> > > >>> to
> > > >>>>> replace it?
> > > >>>>>>
> > > ---------------------------------------------------------------------
> > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > ---------------------------------------------------------------------
> > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>>
> > > >>>
> ---------------------------------------------------------------------
> > > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >>>
> > > >>>
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>

Reply via email to