Sorry, maybe my spam filter got them or something, but I have never seen a JIRA number mentioned in the thread before this one. Just looked back through again to make sure, and this is the first email I have with one.
-Jeremiah > On Oct 22, 2018, at 9:37 PM, sankalp kohli <kohlisank...@gmail.com> wrote: > > Here are some of the JIRAs which are fixed but actually did not fix the > issue. We have tried fixing this by several patches. May be it will be > fixed when Gossip is rewritten(CASSANDRA-12345). I should find or create a > new JIRA as this issue still exists. > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D10366&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=lI3KEen0YYUim6t3VWsvITHUZfFX8oYaczP_t3kk21o&s=W_HfejhgW1gmZ06L0CXOnp_EgBQ1oI5MLMoyz0OrvFw&e= > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D10089&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=lI3KEen0YYUim6t3VWsvITHUZfFX8oYaczP_t3kk21o&s=qXzh1nq2yE27J8SvwYoRf9HPQE83m07cKdKVHXyOyAE&e= > (related to it) > > Also the quote you are using was written as a follow on email. I have > already said what the bug I was referring to. > > "Say you restarted all instances in the cluster and status for some host > goes missing. Now when you start a host replacement, the new host won’t > learn about the host whose status is missing and the view of this host will > be wrong." > > - CASSANDRA-10366 > > > On Mon, Oct 22, 2018 at 7:22 PM Sankalp Kohli <kohlisank...@gmail.com> > wrote: > >> I will send the JIRAs of the bug which we thought we have fixed but it >> still exists. >> >> Have you done any correctness testing after doing all these tests...have >> you done the tests for 1000 instance clusters? >> >> It is great you have done these tests and I am hoping the gossiping snitch >> is good. Also was there any Gossip bug fixed post 3.0? May be I am seeing >> the bug which is fixed. >> >>> On Oct 22, 2018, at 7:09 PM, J. D. Jordan <jeremiah.jor...@gmail.com> >> wrote: >>> >>> Do you have a specific gossip bug that you have seen recently which >> caused a problem that would make this happen? Do you have a specific JIRA >> in mind? “We can’t remove this because what if there is a bug” doesn’t >> seem like a good enough reason to me. If that was a reason we would never >> make any changes to anything. >>> I think many people have seen PFS actually cause real problems, where >> with GPFS the issue being talked about is predicated on some theoretical >> gossip bug happening. >>> In the past year at DataStax we have done a lot of testing on 3.0 and >> 3.11 around adding nodes, adding DC’s, replacing nodes, replacing racks, >> and replacing DC’s, all while using GPFS, and as far as I know we have not >> seen any “lost” rack/DC information during such testing. >>> >>> -Jeremiah >>> >>>> On Oct 22, 2018, at 5:46 PM, sankalp kohli <kohlisank...@gmail.com> >> wrote: >>>> >>>> We will have similar issues with Gossip but this will create more >> issues as >>>> more things will be relied on Gossip. >>>> >>>> I agree PFS should be removed but I dont see how it can be with issues >> like >>>> these or someone proves that it wont cause any issues. >>>> >>>> On Mon, Oct 22, 2018 at 2:21 PM Paulo Motta <pauloricard...@gmail.com> >>>> wrote: >>>> >>>>> I can understand keeping PFS for historical/compatibility reasons, but >> if >>>>> gossip is broken I think you will have similar ring view problems >> during >>>>> replace/bootstrap that would still occur with the use of PFS (such as >>>>> missing tokens, since those are propagated via gossip), so that doesn't >>>>> seem like a strong reason to keep it around. >>>>> >>>>> With PFS it's pretty easy to shoot yourself in the foot if you're not >>>>> careful enough to have identical files across nodes and updating it >> when >>>>> adding nodes/dcs, so it's seems to be less foolproof than other >> snitches. >>>>> While the rejection of verbs to invalid replicas on trunk could address >>>>> concerns raised by Jeremy, this would only happen after the new node >> joins >>>>> the ring, so you would need to re-bootstrap the node and lose all the >> work >>>>> done in the original bootstrap. >>>>> >>>>> Perhaps one good reason to use PFS is the ability to easily package it >>>>> across multiple nodes, as pointed out by Sean Durity on CASSANDRA-10745 >>>>> (which is also it's Achilles' heel). To keep this ability, we could >> make >>>>> GPFS compatible with the cassandra-topology.properties file, but >> reading >>>>> only the dc/rack info about the local node. >>>>> >>>>> Em seg, 22 de out de 2018 às 16:58, sankalp kohli < >> kohlisank...@gmail.com> >>>>> escreveu: >>>>> >>>>>> Yes it will happen. I am worried that same way DC or rack info can go >>>>>> missing. >>>>>> >>>>>> On Mon, Oct 22, 2018 at 12:52 PM Paulo Motta < >> pauloricard...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>>> the new host won’t learn about the host whose status is missing and >>>>> the >>>>>>> view of this host will be wrong. >>>>>>> >>>>>>> Won't this happen even with PropertyFileSnitch as the token(s) for >> this >>>>>>> host will be missing from gossip/system.peers? >>>>>>> >>>>>>> Em sáb, 20 de out de 2018 às 00:34, Sankalp Kohli < >>>>>> kohlisank...@gmail.com> >>>>>>> escreveu: >>>>>>> >>>>>>>> Say you restarted all instances in the cluster and status for some >>>>> host >>>>>>>> goes missing. Now when you start a host replacement, the new host >>>>> won’t >>>>>>>> learn about the host whose status is missing and the view of this >>>>> host >>>>>>> will >>>>>>>> be wrong. >>>>>>>> >>>>>>>> PS: I will be happy to be proved wrong as I can also start using >>>>> Gossip >>>>>>>> snitch :) >>>>>>>> >>>>>>>>> On Oct 19, 2018, at 2:41 PM, Jeremy Hanna < >>>>>> jeremy.hanna1...@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Do you mean to say that during host replacement there may be a time >>>>>>> when >>>>>>>> the old->new host isn’t fully propagated and therefore wouldn’t yet >>>>> be >>>>>> in >>>>>>>> all system tables? >>>>>>>>> >>>>>>>>>> On Oct 17, 2018, at 4:20 PM, sankalp kohli < >>>>> kohlisank...@gmail.com> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> This is not the case during host replacement correct? >>>>>>>>>> >>>>>>>>>> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan < >>>>>>>>>> jeremiah.jor...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> As long as we are correctly storing such things in the system >>>>>> tables >>>>>>>> and >>>>>>>>>>> reading them out of the system tables when we do not have the >>>>>>>> information >>>>>>>>>>> from gossip yet, it should not be a problem. (As far as I know >>>>> GPFS >>>>>>>> does >>>>>>>>>>> this, but I have not done extensive code diving or testing to >>>>> make >>>>>>>> sure all >>>>>>>>>>> edge cases are covered there) >>>>>>>>>>> >>>>>>>>>>> -Jeremiah >>>>>>>>>>> >>>>>>>>>>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli < >>>>>> kohlisank...@gmail.com >>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Will GossipingPropertyFileSnitch not be vulnerable to Gossip >>>>> bugs >>>>>>>> where >>>>>>>>>>> we >>>>>>>>>>>> lose hostId or some other fields when we restart C* for large >>>>>>>>>>>> clusters(~1000 instances)? >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa <jji...@gmail.com> >>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> We should, but the 4.0 features that log/reject verbs to >>>>> invalid >>>>>>>>>>> replicas >>>>>>>>>>>>> solves a lot of the concerns here >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jeff Jirsa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna < >>>>>>>> jeremy.hanna1...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have had PropertyFileSnitch for a long time even though >>>>>>>>>>>>> GossipingPropertyFileSnitch is effectively a superset of what >>>>> it >>>>>>>> offers >>>>>>>>>>> and >>>>>>>>>>>>> is much less error prone. There are some unexpected behaviors >>>>>> when >>>>>>>>>>> things >>>>>>>>>>>>> aren’t configured correctly with PFS. For example, if you >>>>>> replace >>>>>>>>>>> nodes in >>>>>>>>>>>>> one DC and add those nodes to that DCs property files and not >>>>> the >>>>>>>> other >>>>>>>>>>> DCs >>>>>>>>>>>>> property files - the resulting problems aren’t very >>>>>> straightforward >>>>>>>> to >>>>>>>>>>>>> troubleshoot. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We could try to improve the resilience and fail fast error >>>>>>> checking >>>>>>>> and >>>>>>>>>>>>> error reporting of PFS, but honestly, why wouldn’t we deprecate >>>>>> and >>>>>>>>>>> remove >>>>>>>>>>>>> PropertyFileSnitch? Are there reasons why GPFS wouldn’t be >>>>>>>> sufficient >>>>>>>>>>> to >>>>>>>>>>>>> replace it? >>>>>>>>>>>>>> >>>>>>>> >> --------------------------------------------------------------------- >>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>>>>> For additional commands, e-mail: >>>>> dev-h...@cassandra.apache.org >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>> >>>>>>>> >>>>>>>> >> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org