No worries...I mentioned the issue not the JIRA number > On Oct 22, 2018, at 8:01 PM, Jeremiah D Jordan <jerem...@datastax.com> wrote: > > Sorry, maybe my spam filter got them or something, but I have never seen a > JIRA number mentioned in the thread before this one. Just looked back > through again to make sure, and this is the first email I have with one. > > -Jeremiah > >> On Oct 22, 2018, at 9:37 PM, sankalp kohli <kohlisank...@gmail.com> wrote: >> >> Here are some of the JIRAs which are fixed but actually did not fix the >> issue. We have tried fixing this by several patches. May be it will be >> fixed when Gossip is rewritten(CASSANDRA-12345). I should find or create a >> new JIRA as this issue still exists. >> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D10366&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=lI3KEen0YYUim6t3VWsvITHUZfFX8oYaczP_t3kk21o&s=W_HfejhgW1gmZ06L0CXOnp_EgBQ1oI5MLMoyz0OrvFw&e= >> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D10089&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=lI3KEen0YYUim6t3VWsvITHUZfFX8oYaczP_t3kk21o&s=qXzh1nq2yE27J8SvwYoRf9HPQE83m07cKdKVHXyOyAE&e= >> (related to it) >> >> Also the quote you are using was written as a follow on email. I have >> already said what the bug I was referring to. >> >> "Say you restarted all instances in the cluster and status for some host >> goes missing. Now when you start a host replacement, the new host won’t >> learn about the host whose status is missing and the view of this host will >> be wrong." >> >> - CASSANDRA-10366 >> >> >> On Mon, Oct 22, 2018 at 7:22 PM Sankalp Kohli <kohlisank...@gmail.com> >> wrote: >> >>> I will send the JIRAs of the bug which we thought we have fixed but it >>> still exists. >>> >>> Have you done any correctness testing after doing all these tests...have >>> you done the tests for 1000 instance clusters? >>> >>> It is great you have done these tests and I am hoping the gossiping snitch >>> is good. Also was there any Gossip bug fixed post 3.0? May be I am seeing >>> the bug which is fixed. >>> >>>> On Oct 22, 2018, at 7:09 PM, J. D. Jordan <jeremiah.jor...@gmail.com> >>> wrote: >>>> >>>> Do you have a specific gossip bug that you have seen recently which >>> caused a problem that would make this happen? Do you have a specific JIRA >>> in mind? “We can’t remove this because what if there is a bug” doesn’t >>> seem like a good enough reason to me. If that was a reason we would never >>> make any changes to anything. >>>> I think many people have seen PFS actually cause real problems, where >>> with GPFS the issue being talked about is predicated on some theoretical >>> gossip bug happening. >>>> In the past year at DataStax we have done a lot of testing on 3.0 and >>> 3.11 around adding nodes, adding DC’s, replacing nodes, replacing racks, >>> and replacing DC’s, all while using GPFS, and as far as I know we have not >>> seen any “lost” rack/DC information during such testing. >>>> >>>> -Jeremiah >>>> >>>>> On Oct 22, 2018, at 5:46 PM, sankalp kohli <kohlisank...@gmail.com> >>> wrote: >>>>> >>>>> We will have similar issues with Gossip but this will create more >>> issues as >>>>> more things will be relied on Gossip. >>>>> >>>>> I agree PFS should be removed but I dont see how it can be with issues >>> like >>>>> these or someone proves that it wont cause any issues. >>>>> >>>>> On Mon, Oct 22, 2018 at 2:21 PM Paulo Motta <pauloricard...@gmail.com> >>>>> wrote: >>>>> >>>>>> I can understand keeping PFS for historical/compatibility reasons, but >>> if >>>>>> gossip is broken I think you will have similar ring view problems >>> during >>>>>> replace/bootstrap that would still occur with the use of PFS (such as >>>>>> missing tokens, since those are propagated via gossip), so that doesn't >>>>>> seem like a strong reason to keep it around. >>>>>> >>>>>> With PFS it's pretty easy to shoot yourself in the foot if you're not >>>>>> careful enough to have identical files across nodes and updating it >>> when >>>>>> adding nodes/dcs, so it's seems to be less foolproof than other >>> snitches. >>>>>> While the rejection of verbs to invalid replicas on trunk could address >>>>>> concerns raised by Jeremy, this would only happen after the new node >>> joins >>>>>> the ring, so you would need to re-bootstrap the node and lose all the >>> work >>>>>> done in the original bootstrap. >>>>>> >>>>>> Perhaps one good reason to use PFS is the ability to easily package it >>>>>> across multiple nodes, as pointed out by Sean Durity on CASSANDRA-10745 >>>>>> (which is also it's Achilles' heel). To keep this ability, we could >>> make >>>>>> GPFS compatible with the cassandra-topology.properties file, but >>> reading >>>>>> only the dc/rack info about the local node. >>>>>> >>>>>> Em seg, 22 de out de 2018 às 16:58, sankalp kohli < >>> kohlisank...@gmail.com> >>>>>> escreveu: >>>>>> >>>>>>> Yes it will happen. I am worried that same way DC or rack info can go >>>>>>> missing. >>>>>>> >>>>>>> On Mon, Oct 22, 2018 at 12:52 PM Paulo Motta < >>> pauloricard...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>>> the new host won’t learn about the host whose status is missing and >>>>>> the >>>>>>>> view of this host will be wrong. >>>>>>>> >>>>>>>> Won't this happen even with PropertyFileSnitch as the token(s) for >>> this >>>>>>>> host will be missing from gossip/system.peers? >>>>>>>> >>>>>>>> Em sáb, 20 de out de 2018 às 00:34, Sankalp Kohli < >>>>>>> kohlisank...@gmail.com> >>>>>>>> escreveu: >>>>>>>> >>>>>>>>> Say you restarted all instances in the cluster and status for some >>>>>> host >>>>>>>>> goes missing. Now when you start a host replacement, the new host >>>>>> won’t >>>>>>>>> learn about the host whose status is missing and the view of this >>>>>> host >>>>>>>> will >>>>>>>>> be wrong. >>>>>>>>> >>>>>>>>> PS: I will be happy to be proved wrong as I can also start using >>>>>> Gossip >>>>>>>>> snitch :) >>>>>>>>> >>>>>>>>>> On Oct 19, 2018, at 2:41 PM, Jeremy Hanna < >>>>>>> jeremy.hanna1...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Do you mean to say that during host replacement there may be a time >>>>>>>> when >>>>>>>>> the old->new host isn’t fully propagated and therefore wouldn’t yet >>>>>> be >>>>>>> in >>>>>>>>> all system tables? >>>>>>>>>> >>>>>>>>>>> On Oct 17, 2018, at 4:20 PM, sankalp kohli < >>>>>> kohlisank...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> This is not the case during host replacement correct? >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan < >>>>>>>>>>> jeremiah.jor...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> As long as we are correctly storing such things in the system >>>>>>> tables >>>>>>>>> and >>>>>>>>>>>> reading them out of the system tables when we do not have the >>>>>>>>> information >>>>>>>>>>>> from gossip yet, it should not be a problem. (As far as I know >>>>>> GPFS >>>>>>>>> does >>>>>>>>>>>> this, but I have not done extensive code diving or testing to >>>>>> make >>>>>>>>> sure all >>>>>>>>>>>> edge cases are covered there) >>>>>>>>>>>> >>>>>>>>>>>> -Jeremiah >>>>>>>>>>>> >>>>>>>>>>>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli < >>>>>>> kohlisank...@gmail.com >>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Will GossipingPropertyFileSnitch not be vulnerable to Gossip >>>>>> bugs >>>>>>>>> where >>>>>>>>>>>> we >>>>>>>>>>>>> lose hostId or some other fields when we restart C* for large >>>>>>>>>>>>> clusters(~1000 instances)? >>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa <jji...@gmail.com> >>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> We should, but the 4.0 features that log/reject verbs to >>>>>> invalid >>>>>>>>>>>> replicas >>>>>>>>>>>>>> solves a lot of the concerns here >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jeff Jirsa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna < >>>>>>>>> jeremy.hanna1...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have had PropertyFileSnitch for a long time even though >>>>>>>>>>>>>> GossipingPropertyFileSnitch is effectively a superset of what >>>>>> it >>>>>>>>> offers >>>>>>>>>>>> and >>>>>>>>>>>>>> is much less error prone. There are some unexpected behaviors >>>>>>> when >>>>>>>>>>>> things >>>>>>>>>>>>>> aren’t configured correctly with PFS. For example, if you >>>>>>> replace >>>>>>>>>>>> nodes in >>>>>>>>>>>>>> one DC and add those nodes to that DCs property files and not >>>>>> the >>>>>>>>> other >>>>>>>>>>>> DCs >>>>>>>>>>>>>> property files - the resulting problems aren’t very >>>>>>> straightforward >>>>>>>>> to >>>>>>>>>>>>>> troubleshoot. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We could try to improve the resilience and fail fast error >>>>>>>> checking >>>>>>>>> and >>>>>>>>>>>>>> error reporting of PFS, but honestly, why wouldn’t we deprecate >>>>>>> and >>>>>>>>>>>> remove >>>>>>>>>>>>>> PropertyFileSnitch? Are there reasons why GPFS wouldn’t be >>>>>>>>> sufficient >>>>>>>>>>>> to >>>>>>>>>>>>>> replace it? >>>>>>>>>>>>>>> >>>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>>>>>> For additional commands, e-mail: >>>>>> dev-h...@cassandra.apache.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org