How do we expose this for the already GA’ed 4.1.0-4.1.5 which are in use out in the world already?
I would be more worried about that than the as yet to be released 5.0.0 which is likely not going to be in production for anyone for at least a few weeks after GA if not months in most shops.
Seems like we already have a proposed fix with a patch posted to the ticket. Is someone who knows the paxos v2 code able to review and commit that if it is correct?


On Jun 27, 2024, at 1:39 PM, Jon Haddad <j...@jonhaddad.com> wrote:


For those that want to go ahead, how do you to disclose to the community that there’s a serious risk to availability?
 
Jon


On Thu, Jun 27, 2024 at 7:52 PM Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote:
It definitely looks like a good thing to investigate and fix.  However, it's not a regression and not new in 5.0.  I think we should push forward with 5.0 and fix/release it separately in a 4.1.x and 5.0.x release.

> On Jun 27, 2024, at 12:46 PM, Brandon Williams <dri...@gmail.com> wrote:
>
> I don't know that we expect to fix anything if we don't know it is
> affected yet. ¯\_(ツ)_/¯
>
> Kind Regards,
> Brandon
>
> On Thu, Jun 27, 2024 at 12:37 PM Aleksey Yeshchenko <alek...@apple.com> wrote:
>>
>> Not voting on this, however, if we expect to fix something specific between an RC and GA, then we shouldn’t be starting a vote on RC. In that case it should be another beta.
>>
>>> On 27 Jun 2024, at 18:30, Brandon Williams <dri...@gmail.com> wrote:
>>>
>>> The last time paxos v2 blocked us in CASSANDRA-19617 which also
>>> affected 4.1, I didn't get a sense of strong usage from the community,
>>> so I agree that RC shouldn't be blocked but this can get fixed before
>>> GA.  +1 from me.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>> On Tue, Jun 25, 2024 at 11:11 PM Jon Haddad <j...@jonhaddad.com> wrote:
>>>>
>>>> 5.0 is a massive milestone.  A huge thank you to everyone that's invested their time into the release.  I've done a lot of testing, benchmarking, and tire kicking and it's truly mind blowing how much has gone into 5.0 and how great it is for the community.
>>>>
>>>> I am a bit concerned that CASSANDRA-19668, which I found in 4.1, will also affect 5.0.  This is a pretty serious bug, where using Paxos v2 + off heap memtables can cause a SIGSEV process crash.  I've seen this happen about a dozen times with a client over the last 3 months.  Since the new trie memtables rely on off heap, and both Trie memtables & Paxos V2 is so compelling (esp for multi-dc users), I think there's a good chance that we'll be making an already bad problem even worse, for folks that use LWT.
>>>>
>>>> Unfortunately, until next week I'm unable to put any time into this; I'm on vacation with my family.  I wish I had been able to confirm and raise this issue as a 5.0 blocker sooner, but I've deliberately tried to keep work stuff out of my mind.   Since I'm not 100% sure if this affects 5.0, I'm not blocking the RC, but I don't feel comfortable putting a +1 on a release that I'm at least 80% certain contains a process-crashing bug.
>>>>
>>>> I have a simple 4.1 patch in CASSANDRA-19668, but I haven't landed a commit in several years and I have zero recollection of the entire process of getting it in, nor have I spent any time writing unit or dtests in the C* repo.  I ran a test of 160MM LWTs over several hours with my 4.1 branch and didn't hit any issues, but my client ran for weeks without hitting it so it's hard to say if I've actually addressed the problem, as it's a rare race condition.  Fwiw, I don't need to be the one to handle CASSANDRA-19668, so if someone wants to address it before me, please feel free.  It will likely take me a lot longer to deal with than someone more involved with the process, and I'd want 2 sets of eyes on it anyways given what I already mentioned previously about committing and testing.
>>>>
>>>> Jon
>>>>
>>>>
>>>> On Tue, Jun 25, 2024 at 2:53 PM Mick Semb Wever <m...@apache.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>>> Proposing the test build of Cassandra 5.0-rc1 for release.
>>>>>>
>>>>>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
>>>>>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
>>>>>> Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>>>>>
>>>>>
>>>>>
>>>>> The three green CI runs for this are
>>>>> - https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-2
>>>>> - https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-3
>>>>> - https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-4
>>>>>
>>>>>
>>

Reply via email to