Wanted to add some that I remembered:

  * https://issues.apache.org/jira/browse/CASSANDRA-12811 - data
resurrection, but was marked as normal because was discovered with a test.
Should've marked it as critical.
  * https://issues.apache.org/jira/browse/CASSANDRA-12956 - data loss
(commit log isn't replayed on custom 2i exception)
  * https://issues.apache.org/jira/browse/CASSANDRA-12144 -
undeletable/duplicate rows problem; can be considered data resurrection
and/or sstable corruption.



On Thu, May 7, 2020 at 6:55 PM Joshua McKenzie <jmcken...@apache.org> wrote:

> "ML is plaintext bro" - thanks Mick. ಠ_ಠ
>
> Since we're stuck in the late 90's, here's some links to a gsheet:
>
> Defects by month:
> https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1584867240
> Defects by component:
> https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1946109279
> Defects by type:
> https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=385136105
>
> On Thu, May 7, 2020 at 12:31 PM Joshua McKenzie <joshua.mcken...@gmail.com>
> wrote:
>
>> Hearing the images got killed by the web server. Trying from gmail (sorry
>> for spam). Time to see if it's the apache smtp server or the list culling
>> images:
>>
>> -------------------------------------------
>> I did a little analysis on this data (any defect marked with fixversion
>> 4.0 that rose to the level of critical in terms of availability,
>> correctness, or corruption/loss) and charted some things the rest of the
>> project community might find interesting:
>>
>> 1: Critical (availability, correctness, corruption/loss) defects fixed
>> per month since about 6 months before 3.11.0:
>> [image: monthly.png]
>>
>> 2: Components in which critical defects arose (note: bright red bar ==
>> sum of 3 dark red):
>> [image: Total Defects by Component.png]
>>
>> 3: Type of defect found and fixed (bright red: cluster down or permaloss,
>> dark red: temp corrupt/loss, yellow: incorrect response):
>>
>> [image: Total Defects by Type.png]
>>
>> My personal takeaways from this: a ton of great defect fixing work has
>> gone into 4.0. I'd love it if we had both code coverage analysis for
>> testing on the codebase as well as data to surface where hotspots of
>> defects are in the code that might need further testing (caveat: many have
>> voiced their skepticism of the value of this type of data in the past in
>> this project community, so that's probably another conversation to have on
>> another thread)
>>
>> Hope someone else finds the above interesting if not useful.
>>
>> --
>> Joshua McKenzie
>>
>> On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie <jmcken...@apache.org>
>> wrote:
>>
>>> I did a little analysis on this data (any defect marked with fixversion
>>> 4.0 that rose to the level of critical in terms of availability,
>>> correctness, or corruption/loss) and charted some things the rest of the
>>> project community might find interesting:
>>>
>>> 1: Critical (availability, correctness, corruption/loss) defects fixed
>>> per month since about 6 months before 3.11.0:
>>> [image: monthly.png]
>>>
>>> 2: Components in which critical defects arose (note: bright red bar ==
>>> sum of 3 dark red):
>>> [image: Total Defects by Component.png]
>>>
>>> 3: Type of defect found and fixed (bright red: cluster down or
>>> permaloss, dark red: temp corrupt/loss, yellow: incorrect response):
>>>
>>> [image: Total Defects by Type.png]
>>>
>>> My personal takeaways from this: a ton of great defect fixing work has
>>> gone into 4.0. I'd love it if we had both code coverage analysis for
>>> testing on the codebase as well as data to surface where hotspots of
>>> defects are in the code that might need further testing (caveat: many have
>>> voiced their skepticism of the value of this type of data in the past in
>>> this project community, so that's probably another conversation to have on
>>> another thread)
>>>
>>> Hope someone else finds the above interesting if not useful.
>>>
>>> ~Josh
>>>
>>>
>>> On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi <djo...@apache.org> wrote:
>>>
>>>> Hi Sankalp,
>>>>
>>>> Thanks for bringing this up. At the very minimum, I hope we have
>>>> regression tests for the specific issues we have fixed.
>>>>
>>>> I personally think, the project should focus on building a
>>>> comprehensive test suite. However, some of these issues can only be
>>>> detected at scale. We need users to test* C* in their environment for their
>>>> use-cases. Ideally these folks stand up large clusters and tee their
>>>> traffic to the new cluster and report issues.
>>>>
>>>> If we had an automated test suite that everyone can run at a large
>>>> scale that would be even better.
>>>>
>>>> Thanks,
>>>>
>>>> Dinesh
>>>>
>>>>
>>>> * test != starting C* in a few nodes and looking at logs.
>>>>
>>>> > On May 6, 2020, at 10:11 AM, sankalp kohli <kohlisank...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi,
>>>> >    I want to share some of the serious issues that were found and
>>>> fixed in
>>>> > 3.0.x. I have created this list from JIRA to help us identify areas
>>>> for
>>>> > validating 4.0.  This will also give an insight to the dev community.
>>>> >
>>>> > Let us know if anyone has suggestions on how to better use this data
>>>> in
>>>> > validating 4.0. Also this list might be missing some issues identified
>>>> > early on in 3.0.x or some latest ones.
>>>> >
>>>> > Link: https://tinyurl.com/30seriousissues
>>>> >
>>>> > Thanks,
>>>> > Sankalp
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>
>>>>

-- 
alex p

Reply via email to